There's more...

The CSV file format has a lot of variations. The basic idea of separating fields with a comma is clear, but it could also be a tab, or other special character. Sometimes even the header row is optional.

A CSV file is widely used to store raw data due to its portability and simplicity. It's portable across different applications. We will introduce two simple and typical ways to load a sample CSV file into Spark, and it can be easily modified to fit your use case.

Get Apache Spark 2.x Machine Learning Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.