Running a MapReduce streaming job

In this section we will learn how to run Hadoop Streaming jobs using Oozie. Hadoop Streaming gives the functionality to use different languages such as Python, C++, and Ruby to write MapReduce code.

Note

Read the Oozie documentation at https://oozie.apache.org/docs/4.2.0/WorkflowFunctionalSpec.html#a3.2.2_Map-Reduce_Action and write a Workflow to run a Streaming job. Schedule the same using Coordinator. You can refer to the sample Python mapper and reducer code available at http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/.

Save the Python code from the preceding web links as mapper.py and reducer.py in the streaming folder.

The <mapper> tag makes our mapper and reducer file available ...

Get Apache Oozie Essentials now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.