Chapter 16. Finding, Counting, and Listing All Triangles in Large Graphs

Graphs and matrices are used by social network analysts to represent and analyze information about patterns of ties among social actors (users, friends, and “friends of friends”). In network analysis, data is usually modeled as a graph or set of graphs. A graph is a data structure that has a finite set of nodes, called vertices, together with a finite set of lines, called edges, that join some or all of these nodes. Before we define some metrics using the count of triangles, we need to define a triad and a triangle. Let T = (a,b,c) be a set of three distinct nodes in a graph (identified by G). T is a triad if two of those nodes are connected ({(a,b), (a,c)}) and it is a triangle if all three nodes are connected ({(a,b), (a,c), (b,c)}).

In graph analysis, there are three important metrics:

  • Global clustering coefficient

  • Transitivity ratio, defined as:

    upper T left-parenthesis upper G right-parenthesis equals StartFraction 3 times left-parenthesis number of triangles on the graph right-parenthesis Over left-parenthesis number of connected triads of vertices right-parenthesis EndFraction
  • Local clustering coefficient

In order to calculate these three metrics in a large graph, we need to know the number of triangles in the graph. For details on these metrics, see [28], [25], and [34].

A graph might have hundreds of millions of nodes (e.g., users in a social network) and edges (the relationships between these users), and counting the number of triangles is very time-consuming. This chapter provides two MapReduce solutions that find, count, ...

Get Data Algorithms now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.