You are previewing Machine Learning and Data Science: An Introduction to Statistical Learning Methods with R.
O'Reilly logo
Machine Learning and Data Science: An Introduction to Statistical Learning Methods with R

Book Description

A practitioner's tools have a direct impact on the success of his or her work. This book will provide the data scientist with the tools and techniques required to excel with statistical learning methods in the areas of data access, data munging, exploratory data analysis, supervised machine learning, unsupervised machine learning and model evaluation.

Table of Contents

  1. Introduction
    1. How This Book is Organized
    2. Intended Audience for This Book
    3. What you Will Need
    4. R Code and Figures
    5. Going Beyond This Book
    6. Contacting the Author
  2. Chapter 1 Machine Learning Overview
    1. Types of Machine Learning
    2. Use Case Examples of Machine Learning
      1. Acquire Valued Shoppers Challenge
      2. Netflix
      3. Algorithmic Trading Challenge
      4. Heritage Health Prize
      5. Marketing
      6. Sales
      7. Supply Chain
      8. Risk Management
      9. Customer Support
      10. Human Resources
      11. Google Flu Trends
    3. Process of Machine Learning
    4. Mathematics Behind Machine Learning
    5. Becoming a Data Scientist
    6. R Project for Statistical Computing
    7. RStudio
    8. Using R Packages
    9. Data Sets
    10. Using R in Production
    11. Summary
  3. Chapter 2 Data Access
    1. Managing Your Working Directory
    2. Types of Data Files
    3. Sources of Data
    4. Downloading Data Sets From the Web
    5. Reading CSV Files
    6. Reading Excel Files
    7. Using File Connections
    8. Reading JSON Files
    9. Scraping Data From Websites
    10. SQL Databases
    11. SQL Equivalents in R
    12. Reading Twitter Data
    13. Reading Data From Google Analytics
    14. Writing Data
    15. Summary
  4. Chapter 3 Data Munging
    1. Feature Engineering
    2. Data Pipeline
    3. Data Sampling
    4. Revise Variable Names
    5. Create New Variables
    6. Discretize Numeric Values
    7. Date Handling
    8. Binary Categorical Variables
    9. Merge Data Sets
    10. Ordering Data Sets
    11. Reshape Data Sets
    12. Data Manipulation Using Dplyr
    13. Handle Missing Data
    14. Feature Scaling
    15. Dimensionality Reduction
    16. Summary
  5. Chapter 4 Exploratory Data Analysis
    1. Numeric Summaries
    2. Exploratory Visualizations
    3. Histograms
    4. Boxplots
    5. Barplots
    6. Density Plots
    7. Scatterplots
    8. QQ-Plots
    9. Heatmaps
    10. Missing Value Plots
    11. Expository Plots
    12. Summary
  6. Chapter 5 Regression
    1. Simple Linear Regression
    2. Multiple Linear Regression
    3. Polynomial Regression
    4. Summary
  7. Chapter 6 Classification
    1. A Simple Example
    2. Logistic Regression
    3. Classification Trees
    4. Naïve Bayes
    5. K-Nearest Neighbors
    6. Support Vector Machines
    7. Neural Networks
    8. Ensembles
    9. Random Forests
    10. Gradient Boosting Machines
    11. Summary
  8. Chapter 7 Evaluating Model Performance
    1. Overfitting
    2. Bias and Variance
    3. Confounders
    4. Data Leakage
    5. Measuring Regression Performance
    6. Measuring Classification Performance
    7. Cross Validation
    8. Other Machine Learning Diagnostics
      1. Get More Training Observations
      2. Feature Reduction
      3. Feature Addition
      4. Add Polynomial Features
      5. Fine Tuning the Regularization Parameter
    9. Summary
  9. Chapter 8 Unsupervised Learning
    1. Clustering
    2. Simulating Clusters
    3. Hierarchical Clustering
    4. K-Means Clustering
    5. Principal Component Analysis
    6. Summary
  10. Index