Chapter 5. Running Pig Jobs
In this chapter, we will see how to run Pig jobs from Oozie. Pig is a general-purpose data flow language, which makes running and doing ETL on Hadoop very easy. If you are new to Pig, then I suggest you to check out the tutorial on the Pig website (http://pig.apache.org/docs).
In this chapter, we will:
- Create Oozie Workflows for Pig actions
- Run Pig jobs from Coordinators
From the concept point of view, we will:
- Understand the concept of parameterization of Dataset instances
- Understand the concept of Coordinator controls
- Understand the concept of
config-defaut.xm
l
Chapter case study
We are working on a project related to climate as part of research. So we want to know the rainfall pattern near the Melbourne airport. We want to ...
Get Apache Oozie Essentials now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.