O'Reilly logo

Real-Time Big Data Analytics by Shilpi Saxena, Sumit Gupta

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Working with Parquet

In this section, we will discuss and talk about various operations provided by Spark SQL for working with Parquet data formats with appropriate examples.

Parquet is one of popular columnar data storage format for storing the structured data. Parquet leverages the record shredding and assembly algorithm (http://tinyurl.com/p8kaawg) as described in the Dremel paper (http://research.google.com/pubs/pub36632.html). Parquet supports efficient compression and encoding schemes which is better than just simple flattening of structured tables. Refer to https://parquet.apache.org/ for more information on the Parquet data format.

The DataFrame API of Spark SQL provides convenience operations for writing and reading data in the Parquet ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required