A brief overview

How can we use Cassandra and Spark together for data analysis? How can we use Map/Reduce-like processing when using Spark? What are the general data transformations that can be performed on the data stored in Cassandra using Spark? This is a very brief overview of these capabilities. All Spark-related discussions are centered around the programming aspects. The clustering, deployments, methods of running jobs, and so on are beyond the scope of this chapter.

The most important data abstraction in Spark is Resilient Distributed Dataset (RDD). For all practical purposes, RDD can be considered as an in-memory table of data coming from its data source. The data source can be text files, files stored in HDFS, Cassandra column families, ...

Get Cassandra Design Patterns - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.