Chapter 1. Getting Started with Apache Spark

In this chapter, we will set up Spark and configure it. This chapter is divided into the following recipes:

Installing Spark from binaries
Building the Spark source code with Maven
Launching Spark on Amazon EC2
Deploying Spark on a cluster in standalone mode
Deploying Spark on a cluster with Mesos
Deploying Spark on a cluster with YARN
Using Tachyon as an off-heap storage layer

Introduction

Apache Spark is a general-purpose cluster computing system to process big data workloads. What sets Spark apart from its predecessors, such as MapReduce, is its speed, ease-of-use, and sophisticated analytics.

Apache Spark was originally developed at AMPLab, UC Berkeley, in 2009. It was made open source in 2010 under the BSD ...

Get Spark Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Spark Cookbook by Rishi Yadav

Chapter 1. Getting Started with Apache Spark

Introduction

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly