Hive partitions

By default, a simple query in Hive scans the whole Hive table. This slows down the performance when querying a large-size table. The issue could be resolved by creating Hive partitions, which is very similar to what's in the RDBMS. In Hive, each partition corresponds to a predefined partition column(s) and stores it as a subdirectory in the table's directory in HDFS. When the table gets queried, only the required partitions (directory) of data in the table are queried, so the I/O and time of query is greatly reduced. It is very easy to implement Hive partitions when the table is created and check the partitions created, as follows:

--
Create partitions when creating tables
jdbc:hive2://> CREATE TABLE employee_partitioned
. . . . ...

Get Apache Hive Essentials now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.