Top K statistics in Hive
It is the mechanism of collecting the top K column values of a Hive table. In this, the top K values of the most skewed column are stored in the partition. This is applicable for both existing and newly created tables.
How to do it…
Top K statistics computation is disabled by default. The following are some of the properties that could be set to compute and store top K statistics:
hive.stats.topk.collect
This would enable computing top K and putting it into skewed information:
- Default Value:
false
- Valid Values:
true
,false
- Default Value:
hive.stats.topk.num
- Using this property, you can specify K value for your top K result
hive.stats.topk.minpercent
- It is the minimal percentage of a row value to be in top K result
- It could be any
float
value between ...
Get Apache Hive Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.