Extending HiveQL

The HiveQL language can be extended by means of plugins and third-party functions. In Hive, there are three types of functions characterized by the number of rows they take as input and produce as output:

  • User Defined Functions (UDFs): are simpler functions that act on one row at a time.
  • User Defined Aggregate Functions (UDAFs): take multiple rows as input and generate multiple rows as output. These are aggregate functions to be used in conjunction with a GROUP BY statement (similar to COUNT(), AVG(), MIN(), MAX(), and so on).
  • User Defined Table Functions (UDTFs): take multiple rows as input and generate a logical table comprised of multiple rows that can be used in join expressions.

Tip

These APIs are provided only in Java. For other ...

Get Learning Hadoop 2 now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.