How it works...

In this example, we use a Scala sequence data structure to hold the original data, which is a series of cars and their attributes. Using createDataset(), we create a DataSet and populate it. We then proceed to use the 'make' attribute with groupBy and mapGroups() to list cars by their models using a functional paradigm with DataSet. Using this form of functional programming with domain objects was not impossible before DataSet (for example, the case class with RDD or UDF with DataFrame), but the DataSet construct makes this easy and intrinsic.

Get Apache Spark 2.x Machine Learning Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.