Chapter 6. Scaling Up

In this chapter, we will cover the following recipes:

Building the Uber JAR
Submitting jobs to the Spark cluster (local)
Running the Spark standalone cluster on EC2
Running the Spark job on Mesos (local)
Running the Spark job on YARN (local)

Introduction

In this chapter, we'll be looking at how to bundle our Spark application and deploy it on various distributed environments.

As we discussed earlier in Chapter 3, Loading and Preparing Data – DataFrame the foundation of Spark is the RDD. From a programmer's perspective, the composability of RDDs such as a regular Scala collection is a huge advantage. RDD wraps three vital (and two subsidiary) pieces of information that help in reconstruction of data. This enables fault tolerance. ...

Get Scala: Guide for Data Science Professionals now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Scala: Guide for Data Science Professionals by Pascal Bugnion, Arun Manivannan, Patrick R. Nicolas

Chapter 6. Scaling Up

Introduction

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly