Chapter 1. Getting Started with Apache Spark

In this chapter, we will set up Spark and configure it. This chapter is divided into the following recipes:

  • Installing Spark from binaries
  • Building the Spark source code with Maven
  • Launching Spark on Amazon EC2
  • Deploying Spark on a cluster in standalone mode
  • Deploying Spark on a cluster with Mesos
  • Deploying Spark on a cluster with YARN
  • Using Tachyon as an off-heap storage layer

Introduction

Apache Spark is a general-purpose cluster computing system to process big data workloads. What sets Spark apart from its predecessors, such as MapReduce, is its speed, ease-of-use, and sophisticated analytics.

Apache Spark was originally developed at AMPLab, UC Berkeley, in 2009. It was made open source in 2010 under the BSD ...

Get Spark Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.