In this chapter, we show you how to download the Pig binary and start the Pig command line.
Before you can run Pig on your machine or your Hadoop cluster, you will need to download and install it. If someone else has taken care of this, you can skip ahead to “Running Pig”.
You can download Pig as a complete package or as source code that you build. You can also get it as part of a Hadoop distribution.
You can download the official version of Apache Pig, which comes packaged with all of the required JAR files, from Pig’s release page.
Pig does not need to be installed on your Hadoop cluster. It runs on the machine from which you launch Hadoop jobs. Though you can run Pig from your laptop or desktop, in practice, most cluster owners set up one or more machines that have access to their Hadoop cluster but are not part of the cluster (that is, they are not DataNodes or TaskTrackers/NodeManagers). This makes it easier for administrators to update Pig and associated tools, as well as to secure access to the clusters. These machines are called gateway machines or edge machines. In this book we use the term gateway machine.
You will need to install Pig on these gateway machines. If your Hadoop cluster is accessible from your desktop or laptop, you can install Pig there as well. Also, you can install Pig on your local machine if you plan to use Pig in local mode (see “Running Pig Locally ...