Spark expresses all computations as a sequence of transformations and actions on distributed collections, called Resilient Distributed Datasets (RDD). Let's explore how RDDs work with the Spark shell. Navigate to the examples directory and open a Spark shell as follows:
$ spark-shell scala>
Let's start by loading an email in an RDD:
scala> val email = sc.textFile("ham/9-463msg1.txt") email: rdd.RDD[String] = MapPartitionsRDD at textFile
textFile method on an object called
scala> sc spark.SparkContext = org.apache.spark.SparkContext@459bf87c
sc is a
SparkContext instance, an object representing ...