Glossary

accessible

In the context of a computing cluster, a node is accessible if it is reachable through the network. In other contexts, a tool or library is accessible if it easily accessed and understandable to particular groups.

accumulator

A shared variable to which only associative operations might be applied, like addition (particularly in Spark, called counters in MapReduce). Because associative operations are order independent, accumulators can stay consistent in a distributed environment, no matter the order of operations.

actions and transformations

See transformations and actions.

agent

Services, usually background processes, that run routinely on the behalf of a user, performing tasks independently. Flume agents are the building blocks of data flows, which ingest and wrangle data from a source to a channel and eventually a sink.

anonymous functions

A function that is not specified by an identifier (variable name). These functions are typically constructed at runtime and passed as arguments to higher-order functions. They can also be used to easily create closures. Anonymous functions are passed to Spark operations to define their behavior. See also closure and lambda function.

application programming interface (API)

A collection of routines, protocols, or interfaces that specify how software components should interact. The MapReduce API specifies interfaces for constructing Mapper, Reducer, and Job subclasses that define MapReduce behavior. Similarly, ...

Get Data Analytics with Hadoop now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.