Data Mining, 3rd Edition

Book description

Data Mining: Practical Machine Learning Tools and Techniques, Third Edition, offers a thorough grounding in machine learning concepts as well as practical advice on applying machine learning tools and techniques in real-world data mining situations. This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining.

Thorough updates reflect the technical changes and modernizations that have taken place in the field since the last edition, including new material on Data Transformations, Ensemble Learning, Massive Data Sets, Multi-instance Learning, plus a new version of the popular Weka machine learning software developed by the authors. Witten, Frank, and Hall include both tried-and-true techniques of today as well as methods at the leading edge of contemporary research.

The book is targeted at information systems practitioners, programmers, consultants, developers, information technology managers, specification writers, data analysts, data modelers, database R&D professionals, data warehouse engineers, data mining professionals. The book will also be useful for professors and students of upper-level undergraduate and graduate-level data mining and machine learning courses who want to incorporate data mining as part of their data management knowledge base and expertise.

  • Provides a thorough grounding in machine learning concepts as well as practical advice on applying the tools and techniques to your data mining projects
  • Offers concrete tips and techniques for performance improvement that work by transforming the input or output in machine learning methods
  • Includes downloadable Weka software toolkit, a collection of machine learning algorithms for data mining tasks—in an updated, interactive interface. Algorithms in toolkit cover: data pre-processing, classification, regression, clustering, association rules, visualization

Table of contents

  1. Cover Image
  2. Content
  3. Title
  4. Copyright
  5. List of Figures
  6. List of Tables
  7. Preface
  8. Acknowledgments
  9. About the Authors
  10. PART I. Introduction to Data Mining
    1. Chapter 1. What’s It All About?
      1. 1.1. Data mining and machine learning
      2. 1.2. Simple examples: the weather and other problems
      3. 1.3. Fielded applications
      4. 1.4. Machine learning and statistics
      5. 1.5. Generalization as search
      6. 1.6. Data mining and ethics
      7. 1.7. Further reading
    2. Chapter 2. Input
      1. 2.1. What's a concept?
      2. 2.2. What's in an example?
      3. 2.3. What's in an attribute?
      4. 2.4. Preparing the input
      5. 2.5. Further reading
    3. Chapter 3. Output
      1. 3.1. Tables
      2. 3.2. Linear models
      3. 3.3. Trees
      4. 3.4. Rules
      5. 3.5. Instance-based representation
      6. 3.6. Clusters
      7. 3.7. Further Reading
    4. Chapter 4. Algorithms
      1. 4.1. InFerring rudimentary rules
      2. 4.2. Statistical modeling
      3. 4.3. Divide-and-conquer: constructing decision trees
      4. 4.4. Covering algorithms: constructing rules
      5. 4.5. Mining association rules
      6. 4.6. Linear models
      7. 4.7. Instance-based learning
      8. 4.8. Clustering
      9. 4.9. Multi-instance learning
      10. 4.10. Further reading
      11. 4.11. Weka implementations
    5. Chapter 5. Credibility
      1. 5.1. Training and testing
      2. 5.2. Predicting performance
      3. 5.3. Cross-validation
      4. 5.4. Other estimates
      5. 5.5. Comparing data mining schemes
      6. 5.6. Predicting probabilities
      7. 5.7. Counting the cost
      8. 5.8. Evaluating numeric prediction
      9. 5.9. Minimum description length principle
      10. 5.10. Applying the MDL principle to clustering
      11. 5.11. Further reading
  11. PART II. Advanced Data Mining
    1. Chapter 6. Implementations
      1. 6.1. Decision trees
      2. 6.2. Classification rules
      3. 6.3. Association rules
      4. 6.4. Extending linear models
      5. 6.5. Instance-based learning
      6. 6.6. Numeric prediction with local linear models
      7. 6.7. Bayesian networks
      8. 6.8. Clustering
      9. 6.9. Semisupervised learning
      10. 6.10. Multi-instance learning
      11. 6.11. Weka implementations
    2. Chapter 7. Data Transformations
      1. 7.1. Attribute selection
      2. 7.2. Discretizing numeric attributes
      3. 7.3. Projections
      4. 7.4. Sampling
      5. 7.5. Cleansing
      6. 7.6. Transforming multiple classes to binary ones
      7. 7.7. Calibrating class probabilities
      8. 7.8. Further reading
      9. 7.9. Weka implementations
    3. Chapter 8. Ensemble Learning
      1. 8.1. Combining multiple models
      2. 8.2. Bagging
      3. 8.3. Randomization
      4. 8.4. Boosting
      5. 8.5. Additive regression
      6. 8.6. Interpretable ensembles
      7. 8.7. Stacking
      8. 8.8. Further reading
      9. 8.9. Weka implementations
    4. Chapter 9. Moving on
      1. 9.1. Applying data mining
      2. 9.2. Learning from massive datasets
      3. 9.3. Data stream learning
      4. 9.4. Incorporating domain knowledge
      5. 9.5. Text mining
      6. 9.6. Web mining
      7. 9.7. Adversarial situations
      8. 9.8. Ubiquitous data mining
      9. 9.9. Further reading
  12. PART III. The Weka Data Mining Workbench
    1. Chapter 10. Introduction to Weka
      1. 10.1. What's in weka?
      2. 10.2. How do you use it?
      3. 10.3. What else can you do?
      4. 10.4. How do you get it?
    2. Chapter 11. The Explorer
      1. 11.1. Getting started
      2. 11.2. Exploring the explorer
      3. 11.3. Filtering algorithms
      4. 11.4. Learning algorithms
      5. 11.5. Metalearning algorithms
      6. 11.6. Clustering algorithms
      7. 11.7. Association-rule learners
      8. 11.8. Attribute selection
    3. Chapter 12. The Knowledge Flow Interface
      1. 12.1. Getting started
      2. 12.2. Components
      3. 12.3. Configuring and connecting the components
      4. 12.4. Incremental learning
    4. Chapter 13. The Experimenter
      1. 13.1. Getting started
      2. 13.2. Simple setup
      3. 13.3. Advanced setup
      4. 13.4. The analyze panel
      5. 13.5. Distributing processing over several machines
    5. Chapter 14. The Command-Line Interface
      1. 14.1. Getting started
      2. 14.2. The structure of weka
      3. 14.3. Command-line options
    6. Chapter 15. Embedded Machine Learning
      1. 15.1. A simple data mining application
    7. Chapter 16. Writing New Learning Schemes
      1. 16.1. An example classifier
      2. 16.2. Conventions for implementing classifiers
    8. Chapter 17. Tutorial Exercises for the Weka Explorer
      1. 17.1. Introduction to the explorer interface
      2. 17.2. Nearest-neighbor learning and decision trees
      3. 17.3. Classification boundaries
      4. 17.4. Preprocessing and parameter tuning
      5. 17.5. Document classification
      6. 17.6. Mining association rules
  13. Index

Product information

  • Title: Data Mining, 3rd Edition
  • Author(s): Ian H. Witten, Eibe Frank, Mark A. Hall
  • Release date: February 2011
  • Publisher(s): Morgan Kaufmann
  • ISBN: 9780080890364