ORDER and SORT

Another aspect to manipulate data in Hive is to properly order or sort the data or result sets to clearly identify the important facts, such as top N values, maximum, minimum, and so on.

There are the following keywords used in Hive to order and sort data:

  • ORDER BY (ASC|DESC): This is similar to the RDBMS ORDER BY statement. A sorted order is maintained across all of the output from every reducer. It performs the global sort using only one reducer, so it takes a longer time to return the result. Usage with LIMIT is strongly recommended for ORDER BY. When hive.mapred.mode = strict (by default, hive.mapred.mode = nonstrict) is set and we do not specify LIMIT, there are exceptions. This can be used as follows:
    jdbc:hive2://> SELECT name ...

Get Apache Hive Essentials now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.