Development and debugging aids

There are three important commands that can help develop, debug, and optimize Pig scripts.

The DESCRIBE command

The DESCRIBE command gives the schema of a relation. This command is useful when you are a Pig Latin beginner and want to understand how operators transform the data. The output corresponding to the groupByCountry relation in the previous script code to find the population of the country is given as follows:

groupByCountry: {group: chararray,generateRecords: {(cc::cname: chararray,ccity::cityName: chararray,ccity::population: long)}} 

The DESCRIBE output has the Pig syntax. In the preceding example, groupByCountry is a Bag data type that contains a group element and another bag, generateRecords.

The EXPLAIN ...

Get Hadoop: Data Processing and Modelling now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.