Running Pig

Pig is a tool that translates statements written in Pig Latin and executes them either on a single machine in standalone mode or on a full Hadoop cluster when in distributed mode. Even in the latter, Pig's role is to translate Pig Latin statements into MapReduce jobs and therefore it doesn't require the installation of additional services or daemons. It is used as a command-line tool with its associated libraries.

Cloudera CDH ships with Apache Pig version 0.12. Alternatively, the Pig source code and binary distributions can be obtained at https://pig.apache.org/releases.html.

As can be expected, the MapReduce mode requires access to a Hadoop cluster and HDFS installation. MapReduce mode is the default mode executed when running the ...

Get Learning Hadoop 2 now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.