Hive indexes

The main purpose of the indexing is to search through the records easily and speed up the query. The goal of Hive indexing is to improve the speed of query lookup on certain columns of a table. Without an index, queries with predicates like WHERE tab1.col1 = 10 load the entire table or partition and process all the rows. But if an index exists for col1, then only a portion of the file needs to be loaded and processed. The improvement in query speed that an index can provide comes at the cost of additional processing to create the index and disk space to store the index. There are two types of indexes:

  • Compact index
  • Bitmap index

The main difference is in storing mapped values of the rows in the different blocks.

Get Modern Big Data Processing with Hadoop now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.