You are previewing Mining Heterogeneous Information Networks.
O'Reilly logo
Mining Heterogeneous Information Networks

Book Description

Real world physical and abstract data objects are interconnected, forming gigantic, interconnected networks. By structuring these data objects and interactions between these objects into multiple types, such networks become semi-structured heterogeneous information networks. Most real world applications that handle big data, including interconnected social media and social networks, scientific, engineering, or medical information systems, online e-commerce systems, and most database systems, can be structured into heterogeneous information networks. Therefore, effective analysis of large-scale heterogeneous information networks poses an interesting but critical challenge. In this monograph, we investigate the principles and methodologies of mining heterogeneous information networks. Departing from many existing network models that view data as homogeneous graphs or networks, our semi-structured heterogeneous information network model leverages the rich semantics of typed nodes and links in a network and uncovers surprisingly rich knowledge from interconnected data. This semi-structured heterogeneous network modeling leads to a series of new principles and powerful methodologies for mining interconnected data, including (1) rank-based clustering and classification, (2) meta-path-based similarity search and mining, (3) relation strength-aware mining, and many other potential developments. This monograph introduces this new research frontier and points out some promising research directions.

Table of Contents

  1. Cover
  2. Half title
  3. Copyright
  4. Title
  5. Contents
  6. Acknowledgments
  7. 1 Introduction
    1. 1.1 What Are Heterogeneous Information Networks?
    2. 1.2 Why Is Mining Heterogeneous Networks a New Game?
    3. 1.3 Organization of the Book
  8. Part I Ranking-Based Clustering and Classification
    1. 2 Ranking-Based Clustering
      1. 2.1 Overview
      2. 2.2 RankClus
        1. 2.2.1 Ranking Functions
        2. 2.2.2 From Conditional Rank Distributions to New Clustering Measures
        3. 2.2.3 Cluster Centers and Distance Measure
        4. 2.2.4 RankClus: Algorithm Summarization
        5. 2.2.5 Experimental Results
      3. 2.3 NetClus
        1. 2.3.1 Ranking Functions
        2. 2.3.2 Framework of NetClus Algorithm
        3. 2.3.3 Generative Model for Target Objects in a Net-Cluster
        4. 2.3.4 Posterior Probability for Target Objects and Attribute Objects
        5. 2.3.5 Experimental Results
    2. 3 Classification of Heterogeneous Information Networks
      1. 3.1 Overview
      2. 3.2 GNetMine
        1. 3.2.1 The Classification Problem Definition
        2. 3.2.2 Graph-based Regularization Framework
      3. 3.3 RankClass
        1. 3.3.1 The Framework of RankClass
        2. 3.3.2 Graph-based Ranking
        3. 3.3.3 Adjusting the Network
        4. 3.3.4 Posterior Probability Calculation
      4. 3.4 Experimental Results
        1. 3.4.1 Dataset
        2. 3.4.2 Accuracy Study
        3. 3.4.3 Case Study
  9. Part II Meta-Path-Based Similarity Search and Mining
    1. 4 Meta-Path-Based Similarity Search
      1. 4.1 Overview
      2. 4.2 PathSim: A Meta-Path-Based Similarity Measure
        1. 4.2.1 Network Schema and Meta-Path
        2. 4.2.2 Meta-Path-Based Similarity Framework
        3. 4.2.3 PathSim: A Novel Similarity Measure
      3. 4.3 Online Query Processing for Single Meta-Path
        1. 4.3.1 Single Meta-Path Concatenation
        2. 4.3.2 Baseline
        3. 4.3.3 Co-Clustering-Based Pruning
      4. 4.4 Multiple Meta-Paths Combination
      5. 4.5 Experimental Results
        1. 4.5.1 Effectiveness
        2. 4.5.2 Efficiency Comparison
        3. 4.5.3 Case-Study on Flickr Network
    2. 5 Meta-Path-Based Relationship Prediction
      1. 5.1 Overview
      2. 5.2 Meta-Path-Based Relationship Prediction Framework
        1. 5.2.1 Meta-Path-Based Topological Feature Space
        2. 5.2.2 Supervised Relationship Prediction Framework
      3. 5.3 Co-Authorship Prediction
        1. 5.3.1 The Co-Authorship Prediction Model
        2. 5.3.2 Experimental Results
      4. 5.4 Relationship Prediction with Time
        1. 5.4.1 Meta-Path-Based Topological Features for Author Citation Relationship Prediction
        2. 5.4.2 The Relationship Building Time Prediction Model
        3. 5.4.3 Experimental Results
  10. Part III Relation Strength-Aware Mining
    1. 6 Relation Strength-Aware Clustering with Incomplete Attributes
      1. 6.1 Overview
      2. 6.2 The Relation Strength-Aware Clustering Problem Definition
        1. 6.2.1 The Clustering Problem
      3. 6.3 The Clustering Framework
        1. 6.3.1 Model Overview
        2. 6.3.2 Modeling Attribute Generation
        3. 6.3.3 Modeling Structural Consistency
        4. 6.3.4 The Unified Model
      4. 6.4 The Clustering Algorithm
        1. 6.4.1 Cluster Optimization
        2. 6.4.2 Link Type Strength Learning
        3. 6.4.3 Putting together: The GenClus Algorithm
      5. 6.5 Experimental Results
        1. 6.5.1 Datasets
        2. 6.5.2 Effectiveness Study
    2. 7 User-Guided Clustering via Meta-Path Selection
      1. 7.1 Overview
      2. 7.2 The Meta-Path Selection Problem for User-Guided Clustering
        1. 7.2.1 The Meta-Path Selection Problem
        2. 7.2.2 User-Guided Clustering
        3. 7.2.3 The Problem Definition
      3. 7.3 The Probabilistic Model
        1. 7.3.1 Modeling the Relationship Generation
        2. 7.3.2 Modeling the Guidance from Users
        3. 7.3.3 Modeling the Quality Weights for Meta-Path Selection
        4. 7.3.4 The Unified Model
      4. 7.4 The Learning Algorithm
        1. 7.4.1 Optimize Clustering Result Given Meta-Path Weights
        2. 7.4.2 Optimize Meta-Path Weights Given Clustering Result
        3. 7.4.3 The PathSelClus Algorithm
      5. 7.5 Experimental Results
        1. 7.5.1 Datasets
        2. 7.5.2 Effectiveness Study
        3. 7.5.3 Case Study on Meta-Path Weights
      6. 7.6 Discussions
    3. 8 Research Frontiers
  11. Bibliography
  12. Authors’ Biographies