Pig action

Let's see the Pig script that will help us calculate the maximum rainfall in each month.

I have saved the input data for this chapter in the input folder placed at BOOK_CODE_HOME/learn_oozie/ch05.

If you have already copied the source code for this folder on HDFS at the start of chapter, then it will automatically go to the right place inside HDFS. If not, you can copy the code to HDFS now.

The input data is comma separated and the columns in the data are as follows:

  • Product code
  • Bureau of Meteorology station number
  • Year, Month, Day
  • Rainfall amount (millimeter's)
  • Period over which rainfall was measured (days)
  • Quality

We will write the Pig script and load the raw input data, which is grouped by year and month. Then, we will calculate maximum rainfall ...

Get Apache Oozie Essentials now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.