There's more...

Dataset has a view called DataFrame, which is a Dataset of rows which is untyped. The Dataset still retains all the transformation abilities of RDD such as filter(), map(), flatMap(), and so on. This is one of the reasons we find Datasets easy to use if we have programmed in Spark using RDDs.

Get Apache Spark 2.x Machine Learning Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.