Summary

In this chapter, we introduced Apache Pig, a platform for large-scale data analysis on Hadoop. In particular, we covered the following topics:

The goals of Pig as a way of providing a dataflow-like abstraction that does not require hands-on MapReduce development
How Pig's approach to processing data compares to SQL, where Pig is procedural while SQL is declarative
Getting started with Pig — an easy task, as it is a library that generates custom code and doesn't require additional services
An overview of the data types, core functions, and extension mechanisms provided by Pig
Examples of applying Pig to analyze the Twitter dataset in detail, which demonstrated its ability to express complex concepts in a very concise fashion
How libraries such ...

Get Learning Hadoop 2 now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Learning Hadoop 2 by Garry Turkington, Gabriele Modena