O'Reilly logo

Programming Pig, 2nd Edition by Daniel Dai, Alan Gates

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 8. Embedding Pig

In addition to running Pig on the command line, you can also invoke Pig programmatically. In this chapter, we will explore two options: embedding Pig inside a scripting language and using the Pig Java APIs.

Embedding Pig Latin in Scripting Languages

As we’ve said previously, Pig Latin is a data flow language. Unlike general-purpose programming languages, it does not include control flow constructs such as if and for. For many data-processing applications, the operators Pig provides are sufficient. But there are classes of problems that either require the data flow to be repeated an indefinite number of times or need to branch based on the results of an operator. Iterative processing, where a calculation needs to be repeated until the margin of error is within an acceptable limit, is one example. It is not possible to know beforehand how many times the data flow will need to be run before processing begins.

Blending data flow and control flow in one language is difficult to do in a way that is useful and intuitive. Building a general-purpose language and all the associated tools, such as IDEs and debuggers, is a considerable undertaking; also, there is no lack of control flow languages already, and turning Pig Latin into a general-purpose language would require users to learn a much bigger language to process their data. For these reasons, the decision was made to instead embed Pig in existing scripting languages. This avoids the need to invent a new language ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required