Exploring indexes

Indexes are useful for increasing the performance of frequent queries based on certain columns. But Hive has limited a capability to index data as indexing large datasets requires sufficient additional storage space and processing overheads. Hive can index the columns to speed up some operations. It stores the indexed data in another table.

How to do it…

Indexes could be created on the tables in Hive. Let us create a sales table in Hive on which we are going to create indexes:

Create table sales(id int, fname string, state string, zip string, ip string, pid string) Row format delimited fields terminated by '\t';

Let us create an index on the state column of this table:

CREATE INDEX index_ip ON TABLE sales(ip) AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' ...

Get Apache Hive Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.