Running Spark on YARN

In the previous recipe, we took a look at how to use Spark's built-in cluster manager; in this recipe, we are going to explore how to use YARN as a cluster manager to execute the Spark application.

Getting ready

To perform this recipe, you should have a running Hadoop cluster. You should also have performed the previous recipe.

How to do it...

As mentioned in the previous recipe, we can either use Spark's built-in cluster manager, or we can use an external cluster manager such as YARN. In order to execute the Spark application on YARN, we need to edit SPARK_HOME/conf/spark-env.sh, and add the following properties to it:

export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
export YARN_CONF_DIR=/usr/local/hadoop/etc/Hadoop

Here, ...

Get Hadoop: Data Processing and Modelling now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.