You are previewing Graph Mining.
O'Reilly logo
Graph Mining

Book Description

What does the Web look like? How can we find patterns, communities, outliers, in a social network? Which are the most central nodes in a network? These are the questions that motivate this work. Networks and graphs appear in many diverse settings, for example in social networks, computer-communication networks (intrusion detection, traffic management), protein-protein interaction networks in biology, document-text bipartite graphs in text retrieval, person-account graphs in financial fraud detection, and others. In this work, first we list several surprising patterns that real graphs tend to follow. Then we give a detailed list of generators that try to mirror these patterns. Generators are important, because they can help with "what if" scenarios, extrapolations, and anonymization. Then we provide a list of powerful tools for graph analysis, and specifically spectral methods (Singular Value Decomposition (SVD)), tensors, and case studies like the famous "pageRank" algorithm and the "HITS" algorithm for ranking web search results. Finally, we conclude with a survey of tools and observations from related fields like sociology, which provide complementary viewpoints. Table of Contents: Introduction / Patterns in Static Graphs / Patterns in Evolving Graphs / Patterns in Weighted Graphs / Discussion: The Structure of Specific Graphs / Discussion: Power Laws and Deviations / Summary of Patterns / Graph Generators / Preferential Attachment and Variants / Incorporating Geographical Information / The RMat / Graph Generation by Kronecker Multiplication / Summary and Practitioner's Guide / SVD, Random Walks, and Tensors / Tensors / Community Detection / Influence/Virus Propagation and Immunization / Case Studies / Social Networks / Other Related Work / Conclusions

Table of Contents

  1. Cover
  2. Title
  3. Copyright
  4. Dedication
  5. Synthesis Lectures on Data Mining and Knowledge Discovery
  6. Abstract
  7. Contents
  8. Acknowledgments
  9. 1 Introduction
  10. Part I Patterns and Laws
    1. 2 Patterns in Static Graphs
      1. 2.1 S-1: Heavy-tailed Degree Distribution
      2. 2.2 S-2: Eigenvalue Power Law (EPL)
      3. 2.3 S-3 Small Diameter
      4. 2.4 S-4, S-5: Triangle Power Laws (TPL, DTPL)
    2. 3 Patterns in Evolving Graphs
      1. 3.1 D-1: Shrinking Diameters
      2. 3.2 D-2: Densification Power Law (DPL)
      3. 3.3 D-3: Diameter-plot and Gelling Point
      4. 3.4 D-4: Oscillating NLCCs Sizes
      5. 3.5 D-5: LPL: Principal Eigenvalue Over Time
    3. 4 Patterns in Weighted Graphs
      1. 4.1 W-1: Snapshot Power Laws (SPL)—“Fortification”
      2. 4.2 W-1: DW-1: Weight Power Law (WPL)
      3. 4.3 DW-2: LWPL: Weighted Principal Eigenvalue Over Time
    4. 5 Discussion—The Structure of Specific Graphs
      1. 5.1 The Internet
      2. 5.2 The World Wide Web (WWW)
    5. 6 Discussion—Power Laws and Deviations
      1. 6.1 Power Laws—Slope Estimation
      2. 6.2 Deviations from Power Laws
        1. 6.2.1 Exponential Cutoffs
        2. 6.2.2 Lognormals or the “DGX” Distribution
        3. 6.2.3 Doubly-Pareto Lognormal (dPln)
    6. 7 Summary of Patterns
  11. Part II Graph Generators
    1. 8 Graph Generators
      1. 8.1 Random Graph Models
        1. 8.1.1 The Erdös-Rényi Random Graph Model
        2. 8.1.2 Generalized Random Graph Models
    2. 9 Preferential Attachment and Variants
      1. 9.1 Main Ideas
        1. 9.1.1 Basic Preferential Attachment
        2. 9.1.2 Initial Attractiveness
        3. 9.1.3 Internal Edges and Rewiring
      2. 9.2 Related Methods
        1. 9.2.1 Edge Copying Models
        2. 9.2.2 Modifying the Preferential Attachment Equation
        3. 9.2.3 Modeling Increasing Average Degree
        4. 9.2.4 Node Fitness Measures
        5. 9.2.5 Generalizing Preferential Attachment
        6. 9.2.6 PageRank-based Preferential Attachment
        7. 9.2.7 The Forest Fire Model
      3. 9.3 Summary of Preferential Attachment Models
    3. 10 Incorporating Geographical Information
      1. 10.1 Early Models
        1. 10.1.1 The Small-World Model
        2. 10.1.2 The Waxman Model
        3. 10.1.3 The BRITE Generator
        4. 10.1.4 Other Geographical Constraints
      2. 10.2 Topology from Resource Optimizations
        1. 10.2.1 The Highly Optimized Tolerance Model
        2. 10.2.2 The Heuristically Optimized Tradeoffs Model
      3. 10.3 Generators for the Internet Topology
        1. 10.3.1 Structural Generators
        2. 10.3.2 The Inet Topology Generator
      4. 10.4 Comparison Studies
    4. 11 The RMat (Recursive MATrix) Graph Generator
    5. 12 Graph Generation by Kronecker Multiplication
    6. 13 Summary and Practitioner’s Guide
  12. Part III Tools and Case Studies
    1. 14 SVD, Random Walks, and Tensors
      1. 14.1 Eigenvalues—Definition and Intuition
      2. 14.2 Singular Value Decomposition (SVD)
      3. 14.3 HITS: Hubs and Authorities
      4. 14.4 PageRank
    2. 15 Tensors
      1. 15.1 Introduction
      2. 15.2 Main Ideas
      3. 15.3 An Example: Tensors at Work
      4. 15.4 Conclusions—Practitioner’s Guide
    3. 16 Community Detection
      1. 16.1 Clustering Coefficient
      2. 16.2 Methods for Extracting Graph Communities
      3. 16.3 A Word of Caution—“No Good Cuts”
    4. 17 Influence/Virus Propagation and Immunization
      1. 17.1 Introduction—Terminology
      2. 17.2 Main Result and its Generality
      3. 17.3 Applications
      4. 17.4 Discussion
        1. 17.4.1 Simulation Examples
        2. 17.4.2 λ1: Measure of Connectivity
      5. 17.5 Conclusion
    5. 18 Case Studies
      1. 18.1 Proximity and Random Walks
      2. 18.2 Automatic Captioning—Multi-modal Querying
        1. 18.2.1 The GCap Method
        2. 18.2.2 Performance and Variations
        3. 18.2.3 Discussion
        4. 18.2.4 Conclusions
      3. 18.3 Center-Piece Subgraphs—Who is the Mastermind?
  13. Part IV Outreach—Related Work
    1. 19 Social Networks
      1. 19.1 Mapping Social Networks
      2. 19.2 Dataset Characteristics
      3. 19.3 Structure from Data
      4. 19.4 Social “Roles”
      5. 19.5 Social Capital
      6. 19.6 Recent Research Directions
      7. 19.7 Differences from Graph Mining
    2. 20 Other Related Work
      1. 20.1 Relational Learning
      2. 20.2 Finding Frequent Subgraphs
      3. 20.3 Navigation in Graphs
        1. 20.3.1 Methods of Navigation
        2. 20.3.2 Relationship Between Graph Topology and Ease of Navigation
      4. 20.4 Using Social Networks in Other Fields
    3. 21 Conclusions
      1. 21.1 Future Research Directions
      2. 21.2 Parting Thoughts
  14. Resources
  15. Bibliography
  16. Authors’ Biographies