How it works...

It is extremely straightforward to read a JavaScript Object Notation (JSON) data file and to transform it into a Dataset with Spark. JSON has become a widely used data format over the past several years and Spark's support for the format is substantial.

In the first part, we demonstrated loading JSON into a Dataset by means of built in JSON parsing functionality in Spark's session. You should take note of Spark's built-in functionality that transforms the JSON data into the car case class.

In the second part, we demonstrated Spark SQL being applied on the Dataset to wrangle the said data into a desirable state. We utilized the Dataset's select method to retrieve the make column and apply the distinct method for the removal ...

Get Apache Spark 2.x Machine Learning Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.