You are previewing Data Mining Patterns: New Methods and Applications.
O'Reilly logo
Data Mining Patterns: New Methods and Applications

Book Description

"Since the introduction of the Apriori algorithm a decade ago, the problem of mining patterns is becoming a very active research area, and efficient techniques have been widely applied to the problems either in industry or science. Currently, the data mining community is focusing on new problems such as: mining new kinds of patterns, mining patterns under constraints, considering new kinds of complex data, and real-world applications of these concepts.

Data Mining Patterns: New Methods and Applications provides an overall view of the recent solutions for mining, and also explores new kinds of patterns. This book offers theoretical frameworks and presents challenges and their possible solutions concerning pattern extractions, emphasizing both research techniques and real-world applications. Data Mining Patterns: New Methods and Applications portrays research applications in data models, techniques and methodologies for mining patterns, multi-relational and multidimensional pattern mining, fuzzy data mining, data streaming, incremental mining, and many other topics."

Table of Contents

  1. Copyright
  2. Preface
  3. Acknowledgment
  4. About the Editors
  5. I. Metric Methods in Data Mining*
    1. ABSTRACT
    2. INTRODUCTION
    3. PARTITIONS, METRICS, ENTROPIES
    4. GEOMETRY OF THE METRIC SPACE OF PARTITIONS OF FINITE SETS
    5. METRIC SPLITTING CRITERIA FOR DECISION TREES
    6. INCREMENTAL CLUSTERING OF CATEGORICAL DATA
    7. CLUSTERING FEATURES AND FEATURE SELECTION
    8. A METRIC APPROACH TO DISCRETIZATION
    9. CONCLUSION AND FUTURE RESEARCH
    10. REFERENCES
    11. NOTE
  6. II. Bi-Directional Constraint Pushing in Frequent Pattern Mining
    1. ABSTRACT
    2. INTRODUCTION
      1. Problem Statement
      2. Chapter Organization
    3. CONSTRAINTS
      1. Categories of Constraints
      2. Bi-Directional Pushing of Constraints
    4. RELATED WORK
    5. LEAP ALGORITHMS: COFI-Leap, HFP-Leap
      1. Important Patterns: Closed and Maximal
      2. COFI-Trees
      3. COFI-Leap
    6. COFI-Leap WITH CONSTRAINTS, BifoldLeap
    7. PARALLEL BifoldLeap: BUILDING THE STRUCTURES IN PARALLEL AND MINING THEM IN PARALLEL
      1. Load Sharing Among Processors
      2. Parallel Leap Traversal Approach: An Example
    8. SEQUENTIAL PERFORMANCE EVALUATION
      1. Impact of P() and Q() Selectivity on BifoldLeap and Dualminer
      2. Scalability Tests
      3. Constraint Checking: Pushing Constraints vs. Postprocessing
      4. Different Distributions
    9. PARALLEL PERFORMANCE EVALUATIONS
      1. Effect of Load Distribution Strategy
      2. Scalability with Respect to Database Size
      3. Scalability with Respect to Number of Processors
    10. CONCLUSION
    11. REFERENCES
  7. III. Mining Hyperclique Patterns: A Summary of Results
    1. ABSTRACT
    2. INTRODUCTION
      1. Related Work
    3. HYPERCLIQUE PATTERN
      1. Hyperclique Pattern Definition
      2. The Equivalence Between All-Confidence Measure and H-Confidence Measure
      3. Anti-Monotone Property of H-Confidence
    4. THE CROSS-SUPPORT PROPERTY
      1. Illustration of the Cross-Support Property
      2. Generalization of the Cross-Support Property
    5. THE H-CONFIDENCE AS A MEASURE OF ASSOCIATION
      1. Relationship Between H-confidence and Jaccard
      2. Relationship Between H-Confidence and Correlation
      3. H-Confidence for Measuring the Relationship among Several Objects
    6. HYPERCLIQUE MINER ALGORITHM
      1. Explanation of the Detailed Steps of the Algorithm
    7. HYPERCLIQUE-BASED ITEM CLUSTERING APPROACH
    8. EXPERIMENTAL RESULTS
      1. The Experimental Setup
        1. Experimental Data Sets
        2. Experimental Platform
      2. The Pruning Effect of Hyperclique Miner
      3. The Effect of Cross-Support Pruning
      4. Scalability of Hyperclique Miner
      5. Quality of Hyperclique Patterns
      6. Hyperclique-Based Item Clustering
      7. An Application of Hyperclique Patterns for Identifying Protein Functional Modules
    9. CONCLUSION
    10. REFERENCES
    11. ENDNOTES
  8. IV. Pattern Discovery in Biosequences: From Simple to Complex
    1. ABSTRACT
    2. INTRODUCTION
    3. BACKGROUND
      1. String, Suffix, and Don’t Care
      2. Distance Between Strings and Approximate Occurrences
      3. Data Structures: Tries and Suffix Trees
    4. FORMALIZATION AND APPROACHES
      1. A Formalization of the Problem
      2. The Problem of Pattern Discovery
      3. An Overview of the Proposed Approaches and Applications
    5. EMERGENT AND FUTURE TRENDS
    6. CONCLUSION
    7. REFERENCES
  9. V. Finding Patterns in Class-Labeled Data Using Data Visualization
    1. ABSTRACT
    2. INTRODUCTION
    3. VISUALIZATION METHODS
    4. VizRank PROJECTION RANKING
      1. Projection Scoring and Selection of Machine Learning Method
      2. Search Heuristic
      3. Note on Complexity of the Algorithm
    5. EXPERIMENTAL ANALYSIS
      1. Datasets
      2. Top-Ranked Projections
      3. Visualization-Based Classification
      4. Detecting Outliers
      5. Using VizRank with Parallel Coordinates
      6. Related Work
    6. CONCLUSION
    7. REFERENCES
  10. VI. Summarizing Data Cubes Using Blocks
    1. ABSTRACT
    2. INTRODUCTION
    3. MULTIDIMENSIONAL DATABASES AND BLOCKS
      1. Basic Definitions
      2. Blocks
      3. Support and Confidence of a Block
      4. Properties
    4. ALGORITHMS
      1. Block Generation for Single Measure Values
      2. Processing Interval-Based Blocks
      3. Complexity Issues
    5. REFINING THE COMPUTATION OF BLOCKS
      1. Cell Neighborhood
      2. Modified Computation of Blocks
      3. Completeness Properties
      4. Theorem 1
    6. EXPERIMENTS
    7. RELATED WORK
    8. CONCLUSION
    9. NOTE
    10. REFERENCES
  11. VII. Social Network Mining from the Web
    1. ABSTRACT
    2. INTRODUCTION
    3. BACKGROUND
    4. SOCIAL NETWORK MINING FROM THE WEB
      1. Nodes and Edges
      2. Disambiguate a Person Name
    5. ADVANCED MINING METHODS
      1. Class of Relation
      2. Scalability
      3. Name and Word Co-Occurrence
        1. Keyword Extraction
      4. Affiliation Network
    6. IMPORTANT ISSUES
      1. Entity Identification
      2. Integration of Social Networks
    7. SOCIAL NETWORK ANALYSIS
      1. Authoritativeness
      2. Applications
    8. FUTURE TRENDS
      1. Social Network Extraction for General Purpose
      2. Mining Ontology and Structural Knowledge
    9. CONCLUSION
    10. REFERENCES
    11. ENDNOTES
  12. VIII. Discovering Spatio-Textual Association Rules in Document Images
    1. ABSTRACT
    2. INTRODUCTION
    3. BACKGROUND
    4. MAIN THRUST OF THE CHAPTER: ISSUES
    5. OUR APPROACH
    6. DOCUMENT DESCRIPTIONS
    7. MINING SPATIO-TEXTUAL ASSOCIATION RULES WITH SPADA
    8. APPLICATION TO THE TPAMI CORPORA
    9. FUTURE TRENDS
    10. CONCLUSION
    11. REFERENCES
  13. IX. Mining XML Documents
    1. ABSTRACT
    2. INTRODUCTION
    3. TREE-BASED COMPLEX DATA STRUCTURE
    4. DISCOVERING FREQUENT TREE STRUCTURE
      1. Definitions and Examples
      2. Frequent tree discovery Algorithms
        1. Basic Definitions
      3. Edge-Centric Approaches
      4. Tile-Centric Approach
    5. CLASSIFICATION AND CLUSTERING
      1. Representation Using Attribute-Value Structure
      2. Representing Documents by a Set of Paths
        1. Basic Definitions
      3. Stochastic Generative Model
      4. Modeling Documents with Bayesian Networks
      5. A Tree-Like Model for Structured Document Classification
      6. Learning
      7. Experiments
      8. Future Trends for the Stochastic Generative Model
    6. CONCLUSION
    7. REFERENCES
  14. X. Topic and Cluster Evolution Over Noisy Document Streams
    1. ABSTRACT
    2. INTRODUCTION
    3. EVOLVING TOPICS IN CLUSTERS
      1. Tasks of Topic Detection and Tracking
      2. Tracing Changes in Summaries
      3. Monitoring Changes in Cluster Labels
      4. Remembering and Forgetting in a Stream of Documents
    4. MONITORING CLUSTER EVOLUTION
      1. Frameworks for the Identification of Changes
      2. Spatiotemporal Clustering for Cluster Evolution
    5. A TOPIC EVOLUTION METHOD FOR A STREAM OF NOISY DOCUMENTS
      1. Application Case: Workshop Narratives in the Automotive industry
      2. Description and Preparation of the Document Stream
      3. Topic Discovery
      4. Topic Evolution Monitoring
      5. Visualization of Linked Clusters
      6. Experimental Evaluation
    6. CONCLUSION
    7. REFERENCES
    8. ENDNOTES
  15. XI. Discovery of Latent Patterns with Hierarchical Bayesian Mixed – Membership Models and the Issue of Model Choice
    1. ABSTRACT
    2. INTRODUCTION
      1. The Issue of Model Choice
      2. Overview of the Chapter
    3. TWO MOTIVATING CASE STUDIES
      1. PNAS Biological Sciences Collection (1997–2001)
      2. Disability Survey Data (1982–2004)
    4. CHARACTERIZING HBMMMS
      1. Example 1: Latent Dirichlet Allocation
      2. Example 2: Grade of Membership Model
      3. Relationship with Other Data Mining Methods
    5. STRATEGIES FOR MODEL CHOICE
      1. Choice Informed by the Ability to Predict
      2. The Dirichlet Process Prior
      3. Other Criteria for Model Choice
    6. CASE STUDY: PNAS SCIENTIFIC COLLECTION 1997–2001
      1. Modeling Text and References
      2. a) Finite Mixture: The Model
      3. b) Infinite Mixture: The Model
      4. Inference
      5. a) Finite Mixture: Inference
      6. Empirical Results
      7. Evidence from a Simulation Study: A Practice to Avoid
    7. CASE STUDY: DISABILITY PROFILES OF AMERICAN ELDERLY
      1. Modeling Disability
      2. a) Finite Mixture: The Model
      3. b) Infinite Mixture: The Model
      4. a) Finite Mixture: Inference
        1. A – The Variational Approximation for the "Basic" Model
        2. b) The MCMC for the "Fully Bayesian" Model
      5. b) Infinite Mixture: Inference
      6. Empirical Results
    8. CONCLUDING REMARKS
    9. REFERENCES
    10. ENDNOTES
  16. Compilation of References
  17. About the Contributors