Storing and processing Hive data in the Parquet file format

I'm sure that most of the time, you would have created Hive tables and stored data in a text format; in this recipe, we are going to learn how to store data in PARQUET files.

Getting ready

To perform this recipe, you should have a running Hadoop cluster as well as the latest version of Hive installed on it. Here, I am going to use Hive 1.2.1.

How to do it...

Hive 1.2.1 supports various types of files, which help process data more efficiently. Parquet is an ecosystem-wide accepted file format and can be used in Hive, Map Reduce, Pig, Impala, and so on. To store the data in Parquet files, we first need to create one Hive table, which will store the data in a textual format. We will use the ...

Get Hadoop Real-World Solutions Cookbook - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.