Chapter 5. Running Pig Jobs

In this chapter, we will see how to run Pig jobs from Oozie. Pig is a general-purpose data flow language, which makes running and doing ETL on Hadoop very easy. If you are new to Pig, then I suggest you to check out the tutorial on the Pig website (http://pig.apache.org/docs).

In this chapter, we will:

  • Create Oozie Workflows for Pig actions
  • Run Pig jobs from Coordinators

From the concept point of view, we will:

  • Understand the concept of parameterization of Dataset instances
  • Understand the concept of Coordinator controls
  • Understand the concept of config-defaut.xml

Chapter case study

We are working on a project related to climate as part of research. So we want to know the rainfall pattern near the Melbourne airport. We want to ...

Get Apache Oozie Essentials now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.