Running Spark standalone

Spark can be executed in various modes. To get started, we are going to take a look at how to install Apache Spark on a standalone machine.

Getting ready

To perform this recipe, you should download the latest version of Spark. For this recipe, I am using Apache Spark 1.6.0. You can visit the download page at http://spark.apache.org/downloads.html.

How to do it...

Apache Spark is a computation engine. It has a built-in cluster manager. It can also use other cluster managers such as YARN/Mesos and so on. In this recipe, we are going to use the built-in resource manager that's provided by Spark:

  1. Copy the downloaded Spark binary to a desired location.
  2. Extract the tar ball:
    $sudo tar –xzfspark-1.6.0-bin-hadoop2.6.tgz
    
  3. Rename the

Get Hadoop Real-World Solutions Cookbook - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.