Part 1. First steps

We begin this book with an introduction to Apache Spark and its rich API. Understanding the information in part 1 1 is important for writing high-quality Spark programs and is an excellent foundation for the rest of the book.

Chapter 1 roughly describes Spark’s main features and compares them with Hadoop’s MapReduce and other tools from the Hadoop ecosystem. It also includes a description of the spark-in-action virtual machine we’ve prepared for you, which you can use to run the examples in the book.

Chapter 2 further explores the VM, teaches you how to use Spark’s command-line interface (spark-shell), and uses several examples to explain resilient distributed datasets (RDDs)—the central abstraction in Spark.

In chapter ...

Get Spark in Action now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.