Parquet stores nested data structures in a flat columnar format. Parquet is more efficient in terms of storage and performance than any row-level file formats. Parquet stores binary data in a column-oriented way. In the Parquet format, new columns are added at the end of the structure. Cloudera mainly supports this format for Impala implementation but is aggressively becoming popular recently. This format is good for SQL queries, which read particular columns from a wide table having many columns because only selective columns are read to reduce I/O cost.
Parquet
Get Modern Big Data Processing with Hadoop now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.