Creating and using RDD versus DataFrame versus Dataset from a text file in Spark 2.0

In this recipe, we explore the subtle differences in creating RDD, DataFrame, and Dataset from a text file and their relationship to each other via a short sample code:

Dataset: spark.read.textFile()RDD: spark.sparkContext.textFile()DataFrame: spark.read.text()
Assume spark is the session name

Get Apache Spark 2.x Machine Learning Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.