Index design

Using indexes is a very common best practice for performance tuning in relational databases. Hive has supported index creation on tables/partitions since Hive v0.7.0. An index in Hive provides a key-based data view and better data access for certain operations, such as WHERE, GROUP BY, and JOIN. Using an index is always a cheaper alternative than full-table scans. The command to create an index in HQL is straightforward, as follows:

> CREATE INDEX idx_id_employee_id
> ON TABLE employee_id (employee_id)
> AS 'COMPACT'
> WITH DEFERRED REBUILD;
No rows affected (1.149 seconds)

In addition to this COMPACT index, which stores the pair of the indexed column's value and its block ID, HQL has also supported BITMAP indexes since v0.8.0 ...

Get Apache Hive Essentials now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.