Interactive analysis with the SparkR shell

The entry point into SparkR is the SparkContext which connects the R program to a Spark Cluster. When working with the SparkR shell, SQLContext and SparkContext are already available. SparkR's shell provides a simple way to learn the API, as well as a powerful tool to analyze data interactively.

Getting ready

To step through this recipe, you will need a running Spark Cluster either in pseudo distributed mode or in one of the distributed modes, that is, standalone, YARN, or Mesos.

How to do it…

In this recipe, we’ll see how to start SparkR interactive shell using Spark 1.6.0:

  1. Start the SparkR shell by running the following in the SparkR package directory:
     /bigdata/spark-1.6.0-bin-hadoop2.6$ ./bin/sparkR --master ...

Get Apache Spark for Data Science Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.