How it works...

In the older version of Spark, we needed to use a special package to read in CSV, but we now can take advantage of spark.sparkContext.textFile(dataFile) to ingest the file. The Spark which starts the statement is the Spark session (handle to cluster) and can be named anything you like via the creation phase, as shown here:

val spark = SparkSession .builder.master("local[*]") .appName("MyCSV") .config("spark.sql.warehouse.dir", ".") .getOrCreate()spark.sparkContext.textFile(dataFile)spark.sparkContext.textFile(dataFile)

Spark 2.0+ uses spark.sql.warehouse.dir to set the warehouse location to store tables rather than hive.metastore.warehouse.dir. The default value for spark.sql.warehouse.dir is System.getProperty("user.dir") ...

Get Apache Spark 2.x Machine Learning Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.