Summary
In this chapter, we introduced Apache Pig, a platform for large-scale data analysis on Hadoop. In particular, we covered the following topics:
- The goals of Pig as a way of providing a dataflow-like abstraction that does not require hands-on MapReduce development
- How Pig's approach to processing data compares to SQL, where Pig is procedural while SQL is declarative
- Getting started with Pig — an easy task, as it is a library that generates custom code and doesn't require additional services
- An overview of the data types, core functions, and extension mechanisms provided by Pig
- Examples of applying Pig to analyze the Twitter dataset in detail, which demonstrated its ability to express complex concepts in a very concise fashion
- How libraries such ...
Get Learning Hadoop 2 now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.