Adding dependencies between MapReduce jobs
Often we require multiple MapReduce applications to be executed in a workflow-like manner to achieve our objective. Hadoop ControlledJob
and JobControl
classes provide a mechanism to execute a simple workflow graph of MapReduce jobs by specifying the dependencies between them.
In this recipe, we execute the log-grep
MapReduce computation followed by the log-analysis
MapReduce computation on an HTTP server log dataset. The log-grep
computation filters the input data based on a regular expression. The log-analysis
computation analyses the filtered data. Hence, the log-analysis
computation is dependent on the log-grep
computation. We use the ControlledJob
class to express this dependency and use the JobControl ...
Get Hadoop MapReduce v2 Cookbook - Second Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.