Classifying the vertices of a graph using Power Iteration Clustering (PIC) in Spark 2.0

This is a classification method for the vertices of a graph given their similarities as defined by their edges. It uses the GraphX library which is ships out of the box with Spark to implement the algorithm. Power Iteration Clustering is similar to other Eigen Vector/Eigen Value decomposition algorithms, but without the overhead of matrix decomposition. It is suitable when you have a large sparse matrix (for example, graphs depicted as a sparse matrix).

GraphFrames will be the replacement/interface proper for the GraphX library going forward (https://databricks.com/blog/2016/03/03/introducing-graphframes.html).

Get Apache Spark 2.x Machine Learning Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.