Hadoop

Before we jump to loading data, a quick overview of MapReduce is warranted. Although Druid comes prepackaged with a convenient MapReduce job to accommodate historical data, generally speaking, large distributed systems will need custom jobs to perform analyses over the entire data set.

An overview of MapReduce

MapReduce is a framework that breaks processing into two phases: a map phase and a reduce phase. In the map phase, a function is applied to the entire set of input data, one element at a time. Each application of the map function results in a set of tuples, each containing a key and a value. Tuples with similar keys are then combined via the reduce function. The reduce function emits another set of tuples, typically by combining the ...

Get Storm Blueprints: Patterns for Distributed Real-time Computation now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Storm Blueprints: Patterns for Distributed Real-time Computation by P. Taylor Goetz, Brian O'Neill

Hadoop

An overview of MapReduce

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly