User-defined functions

User-defined functions or UDFs, are functions that can be implemented by the developer to extend the functionality of Pig and add custom processing. These functions can be called in almost all Pig operators. UDFs are written in Java. From Pig 0.8 onwards, Python UDFs are supported. In the latest version of Pig, in addition to Python and Java, UDFs can be written in Jython, JavaScript, Ruby, and Groovy.

Other than Java, the rest of the language bindings do not support all interfaces of Pig. For example, the load and store interfaces are not supported by the other language bindings. In this book, we will use Java to build and illustrate the power of UDFs.

There is a repository of Java UDFs called piggy bank. This is a public ...

Get Hadoop: Data Processing and Modelling now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.