Spark GraphX

GraphX is a layer over Spark, thus it leverages all the interesting things about Spark-distributed processing, the algorithms, the versioned computation graph, and so forth. Interestingly, a couple of ML algorithms are written using GraphX APIs. Now refer to the following figure:

Spark GraphX

This diagram shows the layers and the relationships between GraphX, Spark, and the algorithms. GraphX is truly a distributed graph-processing component at scale with powerful partitioning mechanisms, and of course, the in-memory representation that makes iterative processing faster than normal. The programming is much more succinct and very powerful. I went ...

Get Fast Data Processing with Spark 2 - Third Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.