Chapter 30. Graph Analytics

The previous chapter covered some conventional unsupervised techniques. This chapter is going to dive into a more specialized toolset: graph processing. Graphs are data structures composed of nodes, or vertices, which are arbitrary objects, and edges that define relationships between these nodes. Graph analytics is the process of analyzing these relationships. An example graph might be your friend group. In the context of graph analytics, each vertex or node would represent a person, and each edge would represent a relationship. Figure 30-1 shows a sample graph.

image
Figure 30-1. A sample graph with seven nodes and seven edges

This particular graph is undirected, in that the edges do not have a specified “start” and “end” vertex. There are also directed graphs that specify a start and end. Figure 30-2 shows a directed graph where the edges are directional.

image
Figure 30-2. A directed graph

Edges and vertices in graphs can also have data associated with them. In our friend example, the weight of the edge might represent the intimacy between different friends; acquaintances would have low-weight edges between them, while married individuals would have edges with large weights. We could set this value by looking at communication frequency between nodes and weighting ...

Get Spark: The Definitive Guide now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.