Working with Spark's Python and Scala shells

This recipe explains the spark-shell and PySpark command-line interface tools from the Apache Spark project. Spark-shell is the Scala-based command line interface tool and PySpark is the Python-based command-line tool used to develop Spark interactive applications. They are already initialized with SparkContext, SQLContext, and HiveContext.

How to do it…

Both spark-shell and PySpark are available in the bin directory of SPARK_HOME, that is, SPARK_HOME/bin:

  1. Invoke spark-shell as follows:
     $SPARK_HOME/bin/spark-shell [Options] $SPARK_HOME/bin/spark-shell --master <master type> i.e., local, spark, yarn, mesos. $SPARK_HOME/bin/spark-shell --master spark://<sparkmasterHostName>:7077 Welcome to ____ __ / __/__ ...

Get Apache Spark for Data Science Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.