Summary

In this chapter, we introduced Apache Pig, a platform for large-scale data analysis on Hadoop. In particular, we covered the following topics:

  • The goals of Pig as a way of providing a dataflow-like abstraction that does not require hands-on MapReduce development
  • How Pig's approach to processing data compares to SQL, where Pig is procedural while SQL is declarative
  • Getting started with Pig — an easy task, as it is a library that generates custom code and doesn't require additional services
  • An overview of the data types, core functions, and extension mechanisms provided by Pig
  • Examples of applying Pig to analyze the Twitter dataset in detail, which demonstrated its ability to express complex concepts in a very concise fashion
  • How libraries such ...

Get Learning Hadoop 2 now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.