Chapter 5. Loading and Saving Data in Spark

Until now, you have experimented with the Spark shell, figured out how to create a connection with the Spark cluster, and build jobs for deployment. Now to make these jobs useful, you will need to learn how to load and save data in Spark, which we'll do in this chapter.

Before we dive into data, we have a couple of background tasks to do. First we need to get a view of Spark abstractions, and second, have a quick discussion about the different modalities of data.

Spark abstractions

The goal of this book is that you get a good understanding of Spark via hands-on programming. The best way to understand Spark is to work through operations iteratively. As we are still in the initial chapters, some of the things ...

Get Fast Data Processing with Spark 2 - Third Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.