About this Book

Hadoop is an open source framework implementing the MapReduce algorithm behind Google’s approach to querying the distributed data sets that constitute the internet. This definition naturally leads to an obvious question: What are maps and why do they need to be reduced? Massive data sets can be extremely difficult to analyze and query using traditional mechanisms, especially when the queries themselves are quite complicated. In effect, the MapReduce algorithm breaks up both the query and the data set into constituent parts—that’s the mapping. The mapped components of the query can be processed simultaneously—or reduced—to rapidly return results.

This book teaches readers how to use Hadoop and write MapReduce programs. The intended ...

Get Hadoop in Action now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.