O'Reilly logo

Hadoop in Practice by Alex Holmes

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 11. Programming pipelines with Pig

 

This chapter covers
  • Customizing data loading in Pig
  • Data analysis with log data
  • Storing data in a compact format with SequenceFiles
  • Effective workflow and performance techniques

 

Pig is a platform that offers a high-level language with rich data analysis capabilities, making it easy to harness the power of MapReduce in a simplified manner.

Pig started life in Yahoo! as a research project to aid in working rapidly with MapReduce for prototyping purposes, and a year later was externalized into an Apache project. It uses its own language called PigLatin to model and operate on data. It’s extensible with its user-defined functions (UDFs), which allow users to bump down to Java when needed for fine-grained ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required