You are previewing Data Mining Applications with R.
O'Reilly logo
Data Mining Applications with R

Book Description

Data Mining Applications with R is a great resource for researchers and professionals to understand the wide use of R, a free software environment for statistical computing and graphics, in solving different problems in industry. R is widely used in leveraging data mining techniques across many different industries, including government, finance, insurance, medicine, scientific research and more. This book presents 15 different real-world case studies illustrating various techniques in rapidly growing areas. It is an ideal companion for data mining researchers in academia and industry looking for ways to turn this versatile software into a powerful analytic tool.

R code, Data and color figures for the book are provided at the RDataMining.com website.



  • Helps data miners to learn to use R in their specific area of work and see how R can apply in different industries
  • Presents various case studies in real-world applications, which will help readers to apply the techniques in their work
  • Provides code examples and sample data for readers to easily learn the techniques by running the code by themselves

Table of Contents

  1. Cover image
  2. Title page
  3. Table of Contents
  4. Copyright
  5. Preface
    1. Background
    2. Objectives and Significance
    3. Target Audience
  6. Acknowledgments
  7. Review Committee
    1. Additional Reviewers
  8. Foreword
    1. References
  9. Chapter 1. Power Grid Data Analysis with R and Hadoop
    1. Abstract
    2. 1.1 Introduction
    3. 1.2 A Brief Overview of the Power Grid
    4. 1.3 Introduction to MapReduce, Hadoop, and RHIPE
    5. 1.4 Power Grid Analytical Approach
    6. 1.5 Discussion and Conclusions
    7. Appendix
    8. References
  10. Chapter 2. Picturing Bayesian Classifiers: A Visual Data Mining Approach to Parameters Optimization
    1. Abstract
    2. Acknowledgments
    3. 2.1 Introduction
    4. 2.2 Related Works
    5. 2.3 Motivations and Requirements
    6. 2.4 Probabilistic Framework of NB Classifiers
    7. 2.5 Two-Dimensional Visualization System
    8. 2.6 A Case Study: Text Classification
    9. 2.7 Conclusions
    10. References
  11. Chapter 3. Discovery of Emergent Issues and Controversies in Anthropology Using Text Mining, Topic Modeling, and Social Network Analysis of Microblog Content
    1. Abstract
    2. 3.1 Introduction
    3. 3.2 How Many Messages and How Many Twitter-Users in the Sample?
    4. 3.3 Who Is Writing All These Twitter Messages?
    5. 3.4 Who Are the Influential Twitter-Users in This Sample?
    6. 3.5 What Is the Community Structure of These Twitter-Users?
    7. 3.6 What Were Twitter-Users Writing About During the Meeting?
    8. 3.7 What Do the Twitter Messages Reveal About the Opinions of Their Authors?
    9. 3.8 What Can Be Discovered in the Less Frequently Used Words in the Sample?
    10. 3.9 What Are the Topics That Can Be Algorithmically Discovered in This Sample?
    11. 3.10 Conclusion
    12. References
  12. Chapter 4. Text Mining and Network Analysis of Digital Libraries in R
    1. Abstract
    2. 4.1 Introduction
    3. 4.2 Dataset Preparation
    4. 4.3 Manipulating the Document-Term Matrix
    5. 4.4 Clustering Content by Topics Using the LDA
    6. 4.5 Using Similarity Between Documents to Explore Document Cohesion
    7. 4.6 Social Network Analysis of Authors
    8. 4.7 Conclusion
    9. References
  13. Chapter 5. Recommender Systems in R
    1. Abstract
    2. 5.1 Introduction
    3. 5.2 Business Case
    4. 5.3 Evaluation
    5. 5.4 Collaborative Filtering Methods
    6. 5.5 Latent Factor Collaborative Filtering
    7. 5.6 Simplified Approach
    8. 5.7 Roll Your Own
    9. 5.8 Final Thoughts
    10. References
  14. Chapter 6. Response Modeling in Direct Marketing: A Data Mining-Based Approach for Target Selection
    1. Abstract
    2. 6.1 Introduction/Background
    3. 6.2 Business Problem
    4. 6.3 Proposed Response Model
    5. 6.4 Modeling Detail
    6. 6.5 Prediction Result
    7. 6.6 Model Evaluation
    8. 6.7 Conclusion
    9. References
  15. Chapter 7. Caravan Insurance Customer Profile Modeling with R
    1. Abstract
    2. 7.1 Introduction
    3. 7.2 Data Description and Initial Exploratory Data Analysis
    4. 7.3 Classifier Models of Caravan Insurance Holders
    5. 7.4 Discussion of Results and Conclusion
    6. Appendix A Details of the Full Data Set Variables
    7. Appendix B Customer Profile Data-Frequency of Binary Values
    8. Appendix C Proportion of Caravan Insurance Holders vis-à-vis other Customer Profile Variables
    9. Appendix D LR Model Details
    10. Appendix E R Commands for Computation of ROC Curves for Each Model Using Validation Dataset
    11. Appendix F Commands for Cross-Validation Analysis of Classifier Models
    12. References
  16. Chapter 8. Selecting Best Features for Predicting Bank Loan Default
    1. Abstract
    2. 8.1 Introduction
    3. 8.2 Business Problem
    4. 8.3 Data Extraction
    5. 8.4 Data Exploration and Preparation
    6. 8.5 Missing Imputation
    7. 8.6 Modeling
    8. 8.7 Model Evaluation
    9. 8.8 Finding and Model Deployment
    10. 8.9 Lessons and Discussions
    11. Appendix Selecting Best Features for Predicting Bank Loan Default
    12. References
  17. Chapter 9. A Choquet Integral Toolbox and Its Application in Customer Preference Analysis
    1. Abstract
    2. 9.1 Introduction
    3. 9.2 Background
    4. 9.3 Rfmtool Package
    5. 9.4 Case Study
    6. 9.5 Conclusions
    7. References
  18. Chapter 10. A Real-Time Property Value Index Based on Web Data
    1. Abstract
    2. Acknowledgments
    3. 10.1 Introduction
    4. 10.2 Housing Prices and Indices
    5. 10.3 A Data Mining Approach
    6. 10.4 Real Estate Pricing Models
    7. 10.5 Conclusion
    8. References
  19. Chapter 11. Predicting Seabed Hardness Using Random Forest in R
    1. Abstract
    2. Acknowledgments
    3. 11.1 Introduction
    4. 11.2 Study Region and Data Processing
    5. 11.3 Dataset Manipulation and Exploratory Analyses
    6. 11.4 Application of RF for Predicting Seabed Hardness
    7. 11.5 Model Validation Using rfcv
    8. 11.6 Optimal Predictive Model
    9. 11.7 Application of the Optimal Predictive Model
    10. 11.8 Discussion and Conclusions
    11. Appendix AA Dataset of Seabed Hardness and 15 Predictors
    12. Appendix BA R Function, rf.cv, Shows the Cross-Validated Prediction Performance of a Predictive Model
    13. References
  20. Chapter 12. Supervised Classification of Images, Applied to Plankton Samples Using R and Zooimage
    1. Abstract
    2. Acknowledgments
    3. 12.1 Background
    4. 12.2 Challenges
    5. 12.3 Data Extraction and Exploration
    6. 12.4 Data Preprocessing
    7. 12.5 Modeling
    8. 12.6 Model Evaluation
    9. 12.7 Model Deployment
    10. 12.8 Lessons, Discussion, and Conclusions
    11. References
  21. Chapter 13. Crime Analyses Using R
    1. Abstract
    2. 13.1 Introduction
    3. 13.2 Problem Definition
    4. 13.3 Data Extraction
    5. 13.4 Data Exploration and Preprocessing
    6. 13.5 Visualizations
    7. 13.6 Modeling
    8. 13.7 Model Evaluation
    9. 13.8 Discussions and Improvements
    10. References
  22. Chapter 14. Football Mining with R
    1. Abstract
    2. Acknowledgments
    3. 14.1 Introduction to the Case Study and Organization of the Analysis
    4. 14.2 Background of the Analysis: The Italian Football Championship
    5. 14.3 Data Extraction and Exploration
    6. 14.4 Data Preprocessing
    7. 14.5 Model Development: Building Classifiers
    8. 14.6 Model Deployment
    9. 14.7 Concluding Remarks
    10. References
  23. Chapter 15. Analyzing Internet DNS(SEC) Traffic with R for Resolving Platform Optimization
    1. Abstract
    2. 15.1 Introduction
    3. 15.2 Data Extraction from PCAP to CSV File
    4. 15.3 Data Importation from CSV File to R
    5. 15.4 Dimension Reduction Via PCA
    6. 15.5 Initial Data Exploration Via Graphs
    7. 15.6 Variables Scaling and Samples Selection
    8. 15.7 Clustering for Segmenting the FQDN
    9. 15.8 Building Routing Table Thanks to Clustering
    10. 15.9 Building Routing Table Thanks to Mixed Integer Linear Programming
    11. 15.10 Building Routing Table Via a Heuristic
    12. 15.11 Final Evaluation
    13. 15.12 Conclusion
    14. References
  24. Index