Real world physical and abstract data objects are interconnected, forming gigantic, interconnected networks. By structuring these data objects and interactions between these objects into multiple types, such networks become semi-structured heterogeneous information networks. Most real world applications that handle big data, including interconnected social media and social networks, scientific, engineering, or medical information systems, online e-commerce systems, and most database systems, can be structured into heterogeneous information networks. Therefore, effective analysis of large-scale heterogeneous information networks poses an interesting but critical challenge.
In this monograph, we investigate the principles and methodologies of mining heterogeneous information networks. Departing from many existing network models that view data as homogeneous graphs or networks, our semi-structured heterogeneous information network model leverages the rich semantics of typed nodes and links in a network and uncovers surprisingly rich knowledge from interconnected data. This semi-structured heterogeneous network modeling leads to a series of new principles and powerful methodologies for mining interconnected data, including (1) rank-based clustering and classification, (2) meta-path-based similarity search and mining, (3) relation strength-aware mining, and many other potential developments. This monograph introduces this new research frontier and points out some promising research directions.