Chapter 6. Data Aggregation and Sampling

This chapter is about how to aggregate and sample data in Hive. It firstly covers the usage of several aggregation functions, analytic functions working with GROUP BY and PARTITION BY, and windowing clauses. Then, it introduces different ways of sampling data in Hive.

In this chapter, we will cover the following topics:

  • Basic aggregation
  • Advanced aggregation
  • Aggregation condition
  • Analytic functions
  • Sampling

Basic aggregation – GROUP BY

Data aggregation is any process to gather and express data in a summary form to get more information about particular groups based on specific conditions. Hive offers several built-in aggregate functions, such as MAX, MIN, AVG, and so on. Hive also supports advanced aggregation by ...

Get Apache Hive Essentials now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.