Running Spark standalone
Spark can be executed in various modes. To get started, we are going to take a look at how to install Apache Spark on a standalone machine.
Getting ready
To perform this recipe, you should download the latest version of Spark. For this recipe, I am using Apache Spark 1.6.0. You can visit the download page at http://spark.apache.org/downloads.html.
How to do it...
Apache Spark is a computation engine. It has a built-in cluster manager. It can also use other cluster managers such as YARN/Mesos and so on. In this recipe, we are going to use the built-in resource manager that's provided by Spark:
- Copy the downloaded Spark binary to a desired location.
- Extract the tar ball:
$sudo tar –xzfspark-1.6.0-bin-hadoop2.6.tgz
- Rename the
Get Hadoop Real-World Solutions Cookbook - Second Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.