Let's now integrate Jupyter Notebook with PySpark so that we can write our first Spark applications in Python! In the case of our local development environment, the easiest way to integrate Jupyter Notebook with PySpark is to set a global SPARK_HOME environmental variable that points to the directory containing the Spark binaries. Thereafter, we can employ the findspark Python package, as installed earlier, that will append the location of SPARK_HOME, and hence the PySpark API, to sys.path at runtime. Note that findspark should not be used for production-grade code development—instead, Spark applications should be deployed as code artifacts submitted via spark-submit.
Please execute the following shell commands ...