Shared Java and Scala APIs

Once you have a SparkSession object created, it will serve as your main entry point. In the next chapter, you will learn how to use the SparkSession object to load and save data. You can also use SparkSession.SparkContext to launch more Spark jobs and add or remove dependencies. Some of the non-data-driven methods you can use on the SparkSession.SparkContext object are shown here:

Method

Use

addJar(path)

This method adds the JAR file for all the future jobs that would run through the SparkContext object.

addFile(path)

This method downloads the file to all the nodes on the cluster.

listFiles/listJars

This method shows the list of all the currently added files/JARs.

stop()

This method shuts down

Get Fast Data Processing with Spark 2 - Third Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.