File formats

In most of our examples, we have used files in plain text format, but Hive provides a set of file formats that provides optimization at the storage or processing level, or both in some cases. Different types of file format supported by Hive are as follows:

  • TEXTFILE
  • SEQUENCEFILE
  • RCFILE
  • ORC
  • PARQUET
  • AVRO

Each of these formats have a specified structure to store data on the disk. You can also define your own file format and get the data stored in that format by using the INPUTFORMAT class specification provided by Hadoop/Hive.

How to do it…

In all file formats other than text, the table only accepts data in that particular format, such as Row Columnar or Optimized Row Columnar (RC or ORC). If the source data is in that format, it could be easily ...

Get Apache Hive Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.