Pig is another high-level platform that generates MapReduce code dynamically. It is a scripting language similar to Python. The following are its key features:
- Rapid prototyping of algorithms
- Iterative processing of data (chaining)
- Joins are easy using Pig to correlate datasets
- Data can be verified onscreen or saved back to HDFS
The following figure shows the Pig architecture:
The preceding figure shows the following three steps:
- Users start with a Pig script or the Pig command line (called Grunt).
- Pig parses, compiles, optimizes, and fires MapReduce statements.
- MapReduce accesses HDFS and returns the results.