You are previewing Data Mining: Know It All.
O'Reilly logo
Data Mining: Know It All

Book Description

This book brings all of the elements of data mining together in a single volume, saving the reader the time and expense of making multiple purchases. It consolidates both introductory and advanced topics, thereby covering the gamut of data mining and machine learning tactics ? from data integration and pre-processing, to fundamental algorithms, to optimization techniques and web mining methodology.

The proposed book expertly combines the finest data mining material from the Morgan Kaufmann portfolio. Individual chapters are derived from a select group of MK books authored by the best and brightest in the field. These chapters are combined into one comprehensive volume in a way that allows it to be used as a reference work for those interested in new and developing aspects of data mining.

This book represents a quick and efficient way to unite valuable content from leading data mining experts, thereby creating a definitive, one-stop-shopping opportunity for customers to receive the information they would otherwise need to round up from separate sources.

  • Chapters contributed by various recognized experts in the field let the reader remain up to date and fully informed from multiple viewpoints.
  • Presents multiple methods of analysis and algorithmic problem-solving techniques, enhancing the reader’s technical expertise and ability to implement practical solutions.
  • Coverage of both theory and practice brings all of the elements of data mining together in a single volume, saving the reader the time and expense of making multiple purchases.

Table of Contents

  1. Cover
  2. Title
  3. Copyright
  4. Brief Table of Contents
  5. Table of Contents
  6. List of Figures
  7. List of Tables
  8. Copyright Page
  9. About This Book
  10. Contributing Authors
  11. Chapter 1. What's It All About?
    1. 1.1. Data Mining and Machine Learning
    2. 1.2. Simple Examples: The Weather Problem and Others
    3. 1.3. Fielded Applications
    4. 1.4. Machine Learning and Statistics
    5. 1.5. Generalization as Search
    6. 1.6. Data Mining and Ethics
    7. 1.7. Resources
  12. Chapter 2. Data Acquisition and Integration
    1. 2.1. Introduction
    2. 2.2. Sources of Data
    3. 2.3. Variable Types
    4. 2.4. Data Rollup
    5. 2.5. Rollup with Sums, Averages, and Counts
    6. 2.6. Calculation of the Mode
    7. 2.7. Data Integration
  13. Chapter 3. Data Preprocessing
    1. 3.1. Why Preprocess the Data?
    2. 3.2. Descriptive Data Summarization
    3. 3.3. Data Cleaning
    4. 3.4. Data Integration and Transformation
    5. 3.5. Data Reduction
    6. 3.6. Data Discretization and Concept Hierarchy Generation
    7. 3.7. Summary
    8. 3.8. Resources
  14. Chapter 4. Physical Design for Decision Support, Warehousing, and OLAP
    1. 4.1. What Is Online Analytical Processing?
    2. 4.2. Dimension Hierarchies
    3. 4.3. Star and Snowflake Schemas
    4. 4.4. Warehouses and Marts
    5. 4.5. Scaling Up the System
    6. 4.6. DSS, Warehousing, and OLAP Design Considerations
    7. 4.7. Usage Syntax and Examples for Major Database Servers
    8. 4.8. Summary
    9. 4.9. Literature Summary
    10. Resources
  15. Chapter 5. Algorithms: The Basic Methods
    1. 5.1. Inferring Rudimentary Rules
    2. 5.2. Statistical Modeling
    3. 5.3. Divide and Conquer: Constructing Decision Trees
    4. 5.4. Covering Algorithms: Constructing Rules
    5. 5.5. Mining Association Rules
    6. 5.6. Linear Models
    7. 5.7. Instance-Based Learning
    8. 5.8. Clustering
    9. 5.9. Resources
  16. Chapter 6. Further Techniques in Decision Analysis
    1. 6.1. Modeling Risk Preferences
    2. 6.2. Analyzing Risk Directly
    3. 6.3. Dominance
    4. 6.4. Sensitivity Analysis
    5. 6.5. Value of Information
    6. 6.6. Normative Decision Analysis
  17. Chapter 7. Fundamental Concepts of Genetic Algorithms
    1. 7.1. The Vocabulary of Genetic Algorithms
    2. 7.2. Overview
    3. 7.3. The Architecture of a Genetic Algorithm
    4. 7.4. Practical Issues in Using a Genetic Algorithm
    5. 7.5. Review
    6. 7.6 Resources
  18. Chapter 8. Data Structures and Algorithms for Moving Objects Types
    1. 8.1. Data Structures
    2. 8.2. Algorithms for Operations on Temporal Data Types
    3. 8.3. Algorithms for Lifted Operations
    4. 8.4. Resources
  19. Chapter 9. Improving the Model
    1. 9.1. Learning from Errors
    2. 9.2. Improving Model Quality, Solving Problems
    3. 9.3. Summary
  20. Chapter 10. Social Network Analysis
    1. 10.1. Social Sciences and Bibliometry
    2. 10.2. Pagerank and Hyperlink-Induced Topic Search
    3. 10.3. Shortcomings of the Coarse-Grained Graph Model
    4. 10.4. Enhanced Models and Techniques
    5. 10.5. Evaluation of Topic Distillation
    6. 10.6. Measuring and Modeling the Web
    7. 10.7. Resources
    8. References
  21. Index
    1. SYMBOL
    2. A
    3. B
    4. C
    5. D
    6. E
    7. F
    8. G
    9. H
    10. I
    11. J
    12. K
    13. L
    14. M
    15. N
    16. O
    17. P
    18. Q
    19. R
    20. S
    21. T
    22. U
    23. V
    24. W
    25. X