O'Reilly logo

HDInsight Essentials by Rajesh Nadipalli

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Pig solution

Pig is another high-level platform that generates MapReduce code dynamically. It is a scripting language similar to Python. The following are its key features:

  • Rapid prototyping of algorithms
  • Iterative processing of data (chaining)
  • Joins are easy using Pig to correlate datasets
  • Data can be verified onscreen or saved back to HDFS

Pig architecture

The following figure shows the Pig architecture:

Pig architecture

The preceding figure shows the following three steps:

  1. Users start with a Pig script or the Pig command line (called Grunt).
  2. Pig parses, compiles, optimizes, and fires MapReduce statements.
  3. MapReduce accesses HDFS and returns the results.

Pig or Hive?

If you ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required