O'Reilly logo

Real-Time Big Data Analytics by Shilpi Saxena, Sumit Gupta

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Resilient distributed datasets (RDD)

In this section, we will talk about the architecture, motivation, features, and other important concepts related to RDD. We will also briefly talk about the implementation methodology adopted by Spark and various APIs/functions exposed by RDDs.

Frameworks such as Hadoop and MapReduce are widely adopted for parallel and distributed data processing. There is no doubt that these frameworks introduce a new paradigm for distributed data processing and that too in a fault-tolerant manner (without losing a single byte). However, these frameworks do have some limitations; for example, Hadoop is not suited for the problem statements where we need iterative data processing as in recursive functions or machine learning ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required