Introduction

Spark is a first-class data processing platform and programming interface for Big Data which is inexorably linked to the Big Data technology wave. At the time of this writing, Spark is one of the most active open source projects under the Apache Software Foundation (ASF) framework, and it’s one of the most active open source Big Data projects ever.

With so much interest in Spark from the analytics, data processing, and data science communities, it’s important to understand what Spark is, what purpose it serves, what advantages it provides, and how to leverage Spark for Big Data analytics. This book covers all that.

Unlike many other publications dedicated to Spark, which almost exclusively use the Scala API, this book focuses on ...

Get Data Analytics with Spark Using Python, First edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.