8.7 Graphlet Kernels

Graphlet kernels are based on the idea of randomly sampling small (connected) subgraphs of size k {3, 4, 5}. These samples can then be used to compare frequency distributions (count-based approach [52]) or to construct graph invariants (algebraic approaches, e.g., skew spectrum [51], graphlet spectrum [53]). This type of kernel is suited for the comparison of larger graphs with hundreds of vertices and thousands of edges.

8.7.1 Definition

Let G1, . . ., GN denote a set of graphs of size k {3, 4, 5}, the graphlets. We explicitly construct a feature space of dimension N via the k-spectrum

(8.19) equation

where ≠(H img G) denotes the number of embeddings of H in G. The (count-based) graphlet kernel is the inner product in this space,

(8.20) equation

To remove the dependency on the size of G, we normalize the count vector ϕ to a probability vector, yielding the normalized graphlet kernel

(8.21) equation

Isomorphic graphs have the same graphlet k-spectrum. Whether the reverse holds is currently not known.4

8.7.2 Computation

Computing Equation 8.20 or 8.21 requires counting all |V|k = O(|V|k) graphlets. ...

Get Statistical and Machine Learning Approaches for Network Analysis now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.