Get full access to Machine Learning with Apache Spark Quick Start Guide and 60K+ other titles, with a free 10-day trial of O'Reilly.

There are also live events, courses curated by job role, and more.

Start your free trial

PySpark and Jupyter Notebook

Let's now integrate Jupyter Notebook with PySpark so that we can write our first Spark applications in Python! In the case of our local development environment, the easiest way to integrate Jupyter Notebook with PySpark is to set a global SPARK_HOME environmental variable that points to the directory containing the Spark binaries. Thereafter, we can employ the findspark Python package, as installed earlier, that will append the location of SPARK_HOME, and hence the PySpark API, to sys.path at runtime. Note that findspark should not be used for production-grade code development—instead, Spark applications should be deployed as code artifacts submitted via spark-submit.

Please execute the following shell commands ...

Get Machine Learning with Apache Spark Quick Start Guide now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Don’t leave empty-handed

Get Mark Richards’s Software Architecture Patterns ebook to better understand how to design components—and how they should interact.

It’s yours, free.

Get it now

Check it out now on O’Reilly

Dive in for free with a 10-day trial of the O’Reilly learning platform—then explore all the other resources our members count on to build skills and solve problems every day.

Start your free trial Become a member now