You are previewing Data Mining Algorithms: Explained Using R.
O'Reilly logo
Data Mining Algorithms: Explained Using R

Book Description

Data Mining Algorithms is a practical, technically-oriented guide to data mining algorithms that covers the most important algorithms for building classification, regression, and clustering models, as well as techniques used for attribute selection and transformation, model quality evaluation, and creating model ensembles. The author presents many of the important topics and methodologies widely used in data mining, whilst demonstrating the internal operation and usage of data mining algorithms using examples in R.

Table of Contents

  1. Title Page
  2. Copyright
  3. Dedication
  4. Acknowledgements
  5. Preface
    1. Data mining
    2. Motivation
    3. Organization
    4. Notation
    5. R code examples
    6. Website
    7. Further readings
    8. References
  6. Part I: Preliminaries
  7. Chapter 1: Tasks
    1. 1.1 Introduction
    2. 1.2 Inductive learning tasks
    3. 1.3 Classification
    4. 1.4 Regression
    5. 1.5 Clustering
    6. 1.6 Practical issues
    7. 1.7 Conclusion
    8. 1.8 Further readings
    9. References
  8. Chapter 2: Basic statistics
    1. 2.1 Introduction
    2. 2.2 Notational conventions
    3. 2.3 Basic statistics as modeling
    4. 2.4 Distribution description
    5. 2.5 Relationship detection
    6. 2.6 Visualization
    7. 2.7 Conclusion
    8. 2.8 Further readings
    9. References
  9. Part II: Classification
  10. Chapter 3: Decision trees
    1. 3.1 Introduction
    2. 3.2 Decision tree model
    3. 3.3 Growing
    4. 3.4 Pruning
    5. 3.5 Prediction
    6. 3.6 Weighted instances
    7. 3.7 Missing value handling
    8. 3.8 Conclusion
    9. 3.9 Further readings
    10. References
  11. Chapter 4: Naïve Bayes classifier
    1. 4.1 Introduction
    2. 4.2 Bayes rule
    3. 4.3 Classification by Bayesian inference
    4. 4.4 Practical issues
    5. 4.5 Conclusion
    6. 4.6 Further readings
    7. References
  12. Chapter 5: Linear classification
    1. 5.1 Introduction
    2. 5.2 Linear representation
    3. 5.3 Parameter estimation
    4. 5.4 Discrete attributes
    5. 5.5 Conclusion
    6. 5.6 Further readings
    7. References
  13. Chapter 6: Misclassification costs
    1. 6.1 Introduction
    2. 6.2 Cost representation
    3. 6.3 Incorporating misclassification costs
    4. 6.4 Effects of cost incorporation
    5. 6.5 Experimental procedure
    6. 6.6 Conclusion
    7. 6.7 Further readings
    8. References
  14. Chapter 7: Classification model evaluation
    1. 7.1 Introduction
    2. 7.2 Performance measures
    3. 7.3 Evaluation procedures
    4. 7.4 Conclusion
    5. 7.5 Further readings
    6. References
  15. Part III: Regression
  16. Chapter 8: Linear regression
    1. 8.1 Introduction
    2. 8.2 Linear representation
    3. 8.3 Parameter estimation
    4. 8.4 Discrete attributes
    5. 8.5 Advantages of linear models
    6. 8.6 Beyond linearity
    7. 8.7 Conclusion
    8. 8.8 Further readings
    9. References
  17. Chapter 9: Regression trees
    1. 9.1 Introduction
    2. 9.2 Regression tree model
    3. 9.3 Growing
    4. 9.4 Pruning
    5. 9.5 Prediction
    6. 9.6 Weighted instances
    7. 9.7 Missing value handling
    8. 9.8 Piecewise linear regression
    9. 9.9 Conclusion
    10. 9.10 Further readings
    11. References
  18. Chapter 10: Regression model evaluation
    1. 10.1 Introduction
    2. 10.2 Performance measures
    3. 10.3 Evaluation procedures
    4. 10.4 Conclusion
    5. 10.5 Further readings
    6. References
  19. Part IV: Clustering
  20. Chapter 11: (Dis)similarity measures
    1. 11.1 Introduction
    2. 11.2 Measuring dissimilarity and similarity
    3. 11.3 Difference-based dissimilarity
    4. 11.4 Correlation-based similarity
    5. 11.5 Missing attribute values
    6. 11.6 Conclusion
    7. 11.7 Further readings
    8. References
  21. Chapter 12: k-Centers clustering
    1. 12.1 Introduction
    2. 12.2 Algorithm scheme
    3. 12.3 <i xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:svg="http://www.w3.org/2000/svg" xmlns:ibooks="http://vocabulary.itunes.apple.com/rdf/ibooks/vocabulary-extensions-1.0">k</i>-Means-Means
    4. 12.4 Beyond means
    5. 12.5 Beyond (fixed) <i xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:svg="http://www.w3.org/2000/svg" xmlns:ibooks="http://vocabulary.itunes.apple.com/rdf/ibooks/vocabulary-extensions-1.0">k</i>
    6. 12.6 Explicit cluster modeling
    7. 12.7 Conclusion
    8. 12.8 Further readings
    9. References
  22. Chapter 13: Hierarchical clustering
    1. 13.1 Introduction
    2. 13.2 Cluster hierarchies
    3. 13.3 Agglomerative clustering
    4. 13.4 Divisive clustering
    5. 13.5 Hierarchical clustering visualization
    6. 13.6 Hierarchical clustering prediction
    7. 13.7 Conclusion
    8. 13.8 Further readings
    9. References
  23. Chapter 14: Clustering model evaluation
    1. 14.1 Introduction
    2. 14.2 Per-cluster quality measures
    3. 14.3 Overall quality measures
    4. 14.4 External quality measures
    5. 14.5 Using quality measures
    6. 14.6 Conclusion
    7. 14.7 Further readings
    8. References
  24. Part V: Getting Better Models
  25. Chapter 15: Model ensembles
    1. 15.1 Introduction
    2. 15.2 Model committees
    3. 15.3 Base models
    4. 15.4 Model aggregation
    5. 15.5 Specific ensemble modeling algorithms
    6. 15.6 Quality of ensemble predictions
    7. 15.7 Conclusion
    8. 15.8 Further readings
    9. References
  26. Chapter 16: Kernel methods
    1. 16.1 Introduction
    2. 16.2 Support vector machines
    3. 16.3 Support vector regression
    4. 16.4 Kernel trick
    5. 16.5 Kernel functions
    6. 16.6 Kernel prediction
    7. 16.7 Kernel-based algorithms
    8. 16.8 Conclusion
    9. 16.9 Further readings
    10. References
  27. Chapter 17: Attribute transformation
    1. 17.1 Introduction
    2. 17.2 Attribute transformation task
    3. 17.3 Simple transformations
    4. 17.4 Multiclass encoding
    5. 17.5 Conclusion
    6. 17.6 Further readings
    7. References
  28. Chapter 18: Discretization
    1. 18.1 Introduction
    2. 18.2 Discretization task
    3. 18.3 Unsupervised discretization
    4. 18.4 Supervised discretization
    5. 18.5 Effects of discretization
    6. 18.6 Conclusion
    7. 18.7 Further readings
    8. References
  29. Chapter 19: Attribute selection
    1. 19.1 Introduction
    2. 19.2 Attribute selection task
    3. 19.3 Attribute subset search
    4. 19.4 Attribute selection filters
    5. 19.5 Attribute selection wrappers
    6. 19.6 Effects of attribute selection
    7. 19.7 Conclusion
    8. 19.8 Further readings
    9. References
  30. Chapter 20: Case studies
    1. 20.1 Introduction
    2. 20.2 Census income
    3. 20.3 Communities and crime
    4. 20.4 Cover type
    5. 20.5 Conclusion
    6. 20.6 Further readings
    7. References
  31. Closing
    1. Retrospecting
    2. Final words
  32. A: Notation
    1. A.1 Attribute values
    2. A.2 Data subsets
    3. A.3 Probabilities
  33. B: R packages
    1. B.1 CRAN packages
    2. B.2 DMR packages
    3. B.3 Installing packages
    4. References
  34. C: Datasets
  35. Index
  36. End User License Agreement