Performing neighborhood aggregation

GraphX does most of the computation by isolating each vertex and its neighbors. It makes it easier to process the massive graph data on distributed systems. This makes the neighborhood operations very important. GraphX has a mechanism to do it at each neighborhood level in the form of the aggregateMessages method. It does it in two steps:

  1. In the first step (first function of the method), messages are send to the destination vertex or source vertex (similar to the Map function in MapReduce).
  2. In the second step (second function of the method), aggregation is done on these messages (similar to the Reduce function in MapReduce).

Getting ready

Let's build a small dataset of the followers:

Follower

Followee

John

Get Spark Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.