Establishing the architecture

We touched on Hadoop in the previous chapter, but we focused mainly on the map/reduce mechanism within Hadoop. In this chapter, we will do the opposite and focus on the Hadoop File System (HDFS) and Yet Another Resource Negotiator (YARN). We will leverage HDFS to stage the data, and leverage YARN to deploy the Storm framework that will host the topology.

The recent componentization within Hadoop allows any distributed system to use it for resource management. In Hadoop 1.0, resource management was embedded into the MapReduce framework as shown in the following diagram:

Establishing the architecture

Hadoop 2.0 separates out resource management into ...

Get Storm Blueprints: Patterns for Distributed Real-time Computation now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.