Chapter 8. The missing algorithms

This chapter covers

  • Reading RDF files
  • Merging graphs
  • Filtering out isolated vertices
  • Using IndexedRDD for performance gains
  • Taking a simplistic approach to finding graph isomorphisms
  • Computing the global clustering coefficient

You’ve seen examples of reading graph data from edge list files in earlier chapters. RDF is another important file format used for many existing file formats. This chapter shows you how to read in this file format and use this knowledge to make use of the YAGO3 dataset.

Aside from the classic graph algorithms from chapter 6, there are other slightly more modern algorithms that one comes to expect in a graph database or graph processing system. Some of these are missing—not implemented ...

Get Spark GraphX in Action now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.