How it works...

In this section, we transformed an RDD into a Dataset and finally transformed it back to an RDD. We began with a Scala sequence which was changed into an RDD. After the creation of the RDD, invocation of Spark's session createDataset() method occurred, passing the RDD as an argument while receiving a Dataset as the result.

Next, the Dataset was grouped by the make column, counting the existence of various makes of cars. The next step involved filtering the Dataset for makes of Tesla and transforming the results back to an RDD. Finally, we displayed the resulting RDD by way of the RDD foreach() method.

Get Apache Spark 2.x Machine Learning Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.