Chapter 7. Storm on YARN – Low Latency Processing in Hadoop

Hadoop MapReduce builds on the concept of moving computation to data. Data is significantly larger than the instructions to manipulate it. The network is the slowest component in any distributed data processing system, so it is natural to move the smaller piece around, that is, the program itself. With assistance from the NameNode, Hadoop knows exactly how the data resides in a cluster of computers. It uses this data locality information to schedule tasks on appropriate nodes, putting in the best effort to locate the task very close to the data needed by the task.

In this chapter, we will discuss the opposite paradigm, that is, moving data to the compute, also known as the streaming

Get Hadoop: Data Processing and Modelling now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.