Summary

In this chapter, we learned about the different ways to visualize and analyze graphs in Spark. We studied the connectedness of different networks by looking at their degree distribution, finding their connected components, and by calculating their cluster coefficients. In addition, we also learned how to visualize graph data using GraphStream. After this, we showed how the PageRank algorithm can be used to rank node importance in different networks. This chapter also showed us how to use SBT to build a Spark program that relies on third-party libraries.

Throughout this chapter, we have also studied how the basic Spark RDD operations can be used to transform, join, and filter collections of graph vertices and edges. In the next chapter, ...

Get Apache Spark Graph Processing now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.