Running MapReduce jobs on Hadoop using streaming

In our previous recipe, we implemented a simple MapReduce job using the Java API of Hadoop. The use case was the same as what we did in the recipes in Chapter 3, Programming Language Drivers where we implemented MapReduce using the Mongo client APIs in Python and Java. In this recipe, we will use Hadoop streaming to implement MapReduce jobs.

The concept of streaming works on communication using stdin and stdout. You can get more information on Hadoop streaming and how it works at http://hadoop.apache.org/docs/r1.2.1/streaming.html.

Getting ready…

Refer to the Executing our first sample MapReduce job using the mongo-hadoop connector recipe in this chapter to see how to set up Hadoop for development ...

Get MongoDB Cookbook - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.