O'Reilly logo

MongoDB Cookbook by Amol Nayak

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Running MapReduce jobs on Hadoop using streaming

In the previous recipe, we implemented a simple MapReduce job using the Java API of Hadoop. The use case was the same as the one in the recipes of Chapter 3, Programming Language Drivers, where we saw MapReduce implemented using Mongo client APIs in Python and Java. In this recipe, we will use Hadoop streaming to implement MapReduce jobs.

The concept of streaming works based on communication using stdin and stdout. Get more information on what Hadoop streaming is and how it works at http://hadoop.apache.org/docs/r1.2.1/streaming.html.

Getting ready

Refer to the Executing our first sample MapReduce job using the mongo-hadoop connector recipe to see how to set up Hadoop for development purposes and build ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required