References

DataFrames are a relatively recent addition to Spark. There is thus still a dearth of literature and documentation. The first port of call should be the Scala docs, available at: http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.DataFrame.

The Scaladocs for operations available on the DataFrame Column type can be found at: http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.sql.Column.

There is also extensive documentation on the Parquet file format: https://parquet.apache.org.

Get Scala for Data Science now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.