Executing parallel jobs using Oozie (fork)

In this recipe, we are going to take a look at how to execute parallel jobs using the Oozie fork node. Here, we will be executing one Hive and one Pig job in parallel.

Getting ready

To perform this recipe, you should have a running Hadoop cluster as well as the latest version of Oozie, Hive, and Pig installed on it.

How to do it...

For parallel execution, we need to use the fork node given by Oozie. The following is a sample workflow that executes Hive and Pig jobs in parallel:

<workflow-app xmlns="uri:oozie:workflow:0.2" name="demo-wf"> <start to="fork-node"/> <fork name="fork-node"> <path start="pig-node"/> <path start="hive-node"/> </fork> <action name="pig-node"> <pig> <job-tracker>${jobTracker}</job-tracker> ...

Get Hadoop Real-World Solutions Cookbook - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.