Adding dependencies between MapReduce jobs

Often we require multiple MapReduce applications to be executed in a workflow-like manner to achieve our objective. Hadoop ControlledJob and JobControl classes provide a mechanism to execute a simple workflow graph of MapReduce jobs by specifying the dependencies between them.

In this recipe, we execute the log-grep MapReduce computation followed by the log-analysis MapReduce computation on an HTTP server log dataset. The log-grep computation filters the input data based on a regular expression. The log-analysis computation analyses the filtered data. Hence, the log-analysis computation is dependent on the log-grep computation. We use the ControlledJob class to express this dependency and use the JobControl ...

Get Hadoop MapReduce v2 Cookbook - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Hadoop MapReduce v2 Cookbook - Second Edition by Thilina Gunarathne

Adding dependencies between MapReduce jobs

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly