Performing Order By queries in Pig

In this recipe, we will use the Order By operator in Pig scripts to get the desired output.

Getting ready

To perform this recipe, you should have a running Hadoop cluster as well as the latest version of Pig installed on it.

How to do it..

Order By is a very useful operator for data analysis when it comes to sequencing data records based on their values for certain attributes. In order to sequence the records in the proper order, Pig supports Order By.

To learn its usage, we will use the dataset that we took a look at in the previous recipe; in case you don't have the employee dataset, you can perform the following actions.

First of all, load the data in HDFS:

hadoop fs -mkdir /pig/emps_data
hadoop fs -put emps.txt ...

Get Hadoop Real-World Solutions Cookbook - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.