O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Introduction to Machine Learning with R

Book Description

Machine learning is an intimidating subject until you know the fundamentals. If you understand basic coding concepts, this introductory guide will help you gain a solid foundation in machine learning principles. Using the R programming language, you’ll first start to learn with regression modelling and then move into more advanced topics such as neural networks and tree-based methods.

Finally, you’ll delve into the frontier of machine learning, using the caret package in R. Once you develop a familiarity with topics such as the difference between regression and classification models, you’ll be able to solve an array of machine learning problems. Author Scott V. Burger provides several examples to help you build a working knowledge of machine learning.

  • Explore machine learning models, algorithms, and data training
  • Understand machine learning algorithms for supervised and unsupervised cases
  • Examine statistical concepts for designing data for use in models
  • Dive into linear regression models used in business and science
  • Use single-layer and multilayer neural networks for calculating outcomes
  • Look at how tree-based models work, including popular decision trees
  • Get a comprehensive view of the machine learning ecosystem in R
  • Explore the powerhouse of tools available in R’s caret package

Table of Contents

  1. Preface
    1. Who Should Read This Book?
    2. Scope of the Book
    3. Conventions Used in This Book
    4. O’Reilly Safari
    5. How to Contact Us
    6. Acknowledgments
  2. 1. What Is a Model?
    1. Algorithms Versus Models: What’s the Difference?
    2. A Note on Terminology
    3. Modeling Limitations
    4. Statistics and Computation in Modeling
    5. Data Training
    6. Cross-Validation
    7. Why Use R?
    8. The Good
      1. R and Machine Learning
    9. The Bad
    10. Summary
  3. 2. Supervised and Unsupervised Machine Learning
    1. Supervised Models
    2. Regression
    3. Training and Testing of Data
    4. Classification
      1. Logistic Regression
      2. Supervised Clustering Methods
    5. Mixed Methods
      1. Tree-Based Models
      2. Random Forests
      3. Neural Networks
      4. Support Vector Machines
    6. Unsupervised Learning
    7. Unsupervised Clustering Methods
    8. Summary
  4. 3. Sampling Statistics and Model Training in R
    1. Bias
    2. Sampling in R
    3. Training and Testing
      1. Roles of Training and Test Sets
      2. Why Make a Test Set?
      3. Training and Test Sets: Regression Modeling
      4. Training and Test Sets: Classification Modeling
    4. Cross-Validation
      1. k-Fold Cross-Validation
    5. Summary
  5. 4. Regression in a Nutshell
    1. Linear Regression
      1. Multivariate Regression
      2. Regularization
    2. Polynomial Regression
    3. Goodness of Fit with Data—The Perils of Overfitting
      1. Root-Mean-Square Error
      2. Model Simplicity and Goodness of Fit
    4. Logistic Regression
      1. The Motivation for Classification
      2. The Decision Boundary
      3. The Sigmoid Function
      4. Binary Classification
      5. Multiclass Classification
      6. Logistic Regression with Caret
    5. Summary
      1. Linear Regression
      2. Logistic Regression
  6. 5. Neural Networks in a Nutshell
    1. Single-Layer Neural Networks
    2. Building a Simple Neural Network by Using R
      1. Multiple Compute Outputs
      2. Hidden Compute Nodes
    3. Multilayer Neural Networks
    4. Neural Networks for Regression
    5. Neural Networks for Classification
    6. Neural Networks with caret
      1. Regression
      2. Classification
    7. Summary
  7. 6. Tree-Based Methods
    1. A Simple Tree Model
    2. Deciding How to Split Trees
      1. Tree Entropy and Information Gain
    3. Pros and Cons of Decision Trees
      1. Tree Overfitting
      2. Pruning Trees
      3. Decision Trees for Regression
      4. Decision Trees for Classification
    4. Conditional Inference Trees
      1. Conditional Inference Tree Regression
      2. Conditional Inference Tree Classification
    5. Random Forests
      1. Random Forest Regression
      2. Random Forest Classification
    6. Summary
  8. 7. Other Advanced Methods
    1. Naive Bayes Classification
      1. Bayesian Statistics in a Nutshell
      2. Application of Naive Bayes
    2. Principal Component Analysis
      1. Linear Discriminant Analysis
    3. Support Vector Machines
    4. k-Nearest Neighbors
      1. Regression Using kNN
      2. Classification Using kNN
    5. Summary
  9. 8. Machine Learning with the caret Package
    1. The Titanic Dataset
      1. Data Wrangling
    2. caret Unleashed
      1. Imputation
      2. Data Splitting
      3. caret Under the Hood
      4. Model Training
      5. Comparing Multiple caret Models
    3. Summary
  10. A. Encyclopedia of Machine Learning Models in caret
  11. Index