O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

R Machine Learning solutions

Video Description

Build powerful predictive models in R

About This Video

  • Apply R to simple predictive modeling with short and simple code

  • Use machine learning to solve problems ranging from small to big data

  • Build a training and testing dataset from the churn dataset, applying different classification methods

  • In Detail

    R is a statistical programming language that provides impressive tools to analyze data and create high-level graphics. This video course will take you from very basics of R to creating insightful machine learning models with R. You will start with setting up the environment and then perform data ETL in R.

    Data exploration examples are provided that demonstrate how powerful data visualization and machine learning is in discovering hidden relationship. You will then dive into important machine learning topics, including data classification, regression, clustering, association rule mining, and dimensionality reduction.

    Table of Contents

    1. Chapter 1 : Getting Started with R
      1. The Course Overview 00:04:38
      2. Downloading and Installing R 00:06:10
      3. Downloading and Installing RStudio 00:03:10
      4. Installing and Loading Packages 00:05:46
      5. Reading and Writing Data 00:05:54
      6. Using R to Manipulate Data 00:05:47
      7. Applying Basic Statistics 00:04:47
      8. Visualizing Data 00:03:33
      9. Getting a Dataset for Machine Learning 00:02:39
    2. Chapter 2 : Data Exploration with RMS Titanic
      1. Reading a Titanic Dataset from a CSV File 00:08:36
      2. Converting Types on Character Variables 00:03:05
      3. Detecting Missing Values 00:03:19
      4. Missing values affect the inference of a dataset. Thus it is important to detect them. 00:04:31
      5. Exploring and Visualizing Data 00:04:25
      6. Predicting Passenger Survival with a Decision Tree 00:03:59
      7. Validating the Power of Prediction with a Confusion Matrix 00:02:08
      8. Assessing performance with the ROC curve 00:02:33
    3. Chapter 3 : R and Statistics
      1. Understanding Data Sampling in R 00:03:31
      2. Operating a Probability Distribution in R 00:05:42
      3. Working with Univariate Descriptive Statistics in R 00:05:10
      4. Performing Correlations and Multivariate Analysis 00:03:02
      5. Operating Linear Regression and Multivariate Analysis 00:03:25
      6. Conducting an Exact Binomial Test 00:03:48
      7. Performing Student's t-test 00:03:13
      8. Performing the Kolmogorov-Smirnov Test 00:04:43
      9. Understanding the Wilcoxon Rank Sum and Signed Rank Test 00:02:04
      10. Working with Pearson's Chi-Squared Test 00:05:09
      11. Conducting a One-Way ANOVA 00:04:16
      12. Performing a Two-Way ANOVA 00:04:02
    4. Chapter 4 : Understanding Regression Analysis
      1. Fitting a Linear Regression Model with lm 00:04:53
      2. Summarizing Linear Model Fits 00:05:21
      3. Using Linear Regression to Predict Unknown Values 00:02:51
      4. Generating a Diagnostic Plot of a Fitted Model 00:03:58
      5. Fitting a Polynomial Regression Model with lm 00:02:16
      6. Fitting a Robust Linear Regression Model with rlm 00:02:16
      7. Studying a case of linear regression on SLID data 00:06:39
      8. Reducing Dimensions with SVD 00:02:11
      9. Applying the Poisson model for Generalized Linear Regression 00:01:34
      10. Applying the Binomial Model for Generalized Linear Regression 00:02:02
      11. Fitting a Generalized Additive Model to Data 00:03:14
      12. Visualizing a Generalized Additive Model 00:01:27
      13. Diagnosing a Generalized Additive Model 00:03:38
    5. Chapter 5 : Classification – Tree, Lazy, and Probabilistic
      1. Preparing the Training and Testing Datasets 00:03:45
      2. Building a Classification Model with Recursive Partitioning Trees 00:06:10
      3. Visualizing a Recursive Partitioning Tree 00:03:04
      4. Measuring the Prediction Performance of a Recursive Partitioning Tree 00:02:49
      5. Pruning a Recursive Partitioning Tree 00:02:38
      6. Building a Classification Model with a Conditional Inference Tree 00:01:56
      7. Visualizing a Conditional Inference Tree 00:02:38
      8. Measuring the Prediction Performance of a Conditional Inference Tree 00:02:10
      9. Classifying Data with the K-Nearest Neighbor Classifier 00:05:31
      10. Classifying Data with Logistic Regression 00:04:38
      11. Classifying data with the Naïve Bayes Classifier 00:06:16
    6. Chapter 6 : Neural Network and SVM
      1. Classifying Data with a Support Vector Machine 00:05:58
      2. Choosing the Cost of an SVM 00:02:57
      3. Visualizing an SVM Fit 00:03:33
      4. Predicting Labels Based on a Model Trained by an SVM 00:03:49
      5. Tuning an SVM 00:02:48
      6. Training a Neural Network with neuralnet 00:04:08
      7. Visualizing a Neural Network Trained by neuralnet 00:02:22
      8. Predicting Labels based on a Model Trained by neuralnet 00:03:07
      9. Training a Neural Network with nnet 00:02:46
      10. Predicting labels based on a model trained by nnet 00:02:49
    7. Chapter 7 : Model Evaluation
      1. Estimating Model Performance with k-fold Cross Validation 00:03:42
      2. Performing Cross Validation with the e1071 Package 00:03:22
      3. Performing Cross Validation with the caret Package 00:02:59
      4. Ranking the Variable Importance with the caret Package 00:02:21
      5. Ranking the Variable Importance with the rminer Package 00:02:30
      6. Finding Highly Correlated Features with the caret Package 00:02:13
      7. Selecting Features Using the Caret Package 00:04:59
      8. Measuring the Performance of the Regression Model 00:03:58
      9. Measuring Prediction Performance with a Confusion Matrix 00:02:07
      10. Measuring Prediction Performance Using ROCR 00:02:46
      11. Comparing an ROC Curve Using the Caret Package 00:03:44
      12. Measuring Performance Differences between Models with the caret Package 00:03:41
    8. Chapter 8 : Ensemble Learning
      1. Classifying Data with the Bagging Method 00:07:53
      2. Performing Cross Validation with the Bagging Method 00:01:56
      3. Classifying Data with the Boosting Method 00:06:05
      4. Performing Cross Validation with the Boosting Method 00:02:06
      5. Classifying Data with Gradient Boosting 00:07:10
      6. Calculating the Margins of a Classifier 00:05:30
      7. Calculating the Error Evolution of the Ensemble Method 00:02:19
      8. Classifying Data with Random Forest 00:07:02
      9. Estimating the Prediction Errors of Different Classifiers 00:04:35
    9. Chapter 9 : Clustering
      1. Clustering Data with Hierarchical Clustering 00:08:40
      2. Cutting Trees into Clusters 00:03:30
      3. Clustering Data with the k-Means Method 00:04:10
      4. Drawing a Bivariate Cluster Plot 00:03:32
      5. Comparing Clustering Methods 00:04:16
      6. Extracting Silhouette Information from Clustering 00:02:40
      7. Obtaining the Optimum Number of Clusters for k-Means 00:02:49
      8. Clustering Data with the Density-Based Method 00:06:42
      9. Clustering Data with the Model-Based Method 00:04:38
      10. Visualizing a Dissimilarity Matrix 00:03:24
      11. Validating Clusters Externally 00:04:12
    10. Chapter 10 : Association Analysis and Sequence Mining
      1. Transforming Data into Transactions 00:03:35
      2. Displaying Transactions and Associations 00:02:14
      3. Mining Associations with the Apriori Rule 00:07:24
      4. Pruning Redundant Rules 00:02:26
      5. Visualizing Association Rules 00:05:07
      6. Mining Frequent Itemsets with Eclat 00:03:36
      7. Creating Transactions with Temporal Information 00:02:41
      8. Mining Frequent Sequential Patterns with cSPADE 00:04:16
    11. Chapter 11 : Dimension Reduction
      1. Performing Feature Selection with FSelector 00:07:38
      2. Performing Dimension Reduction with PCA 00:07:19
      3. Determining the Number of Principal Components Using the Scree Test 00:03:34
      4. Determining the Number of Principal Components Using the Kaiser Method 00:02:05
      5. Visualizing Multivariate Data Using biplot 00:03:17
      6. Performing Dimension Reduction with MDS 00:05:38
      7. Reducing Dimensions with SVD 00:03:19
      8. Compressing Images with SVD 00:03:05
      9. Performing Nonlinear Dimension Reduction with ISOMAP 00:04:34
      10. Performing Nonlinear Dimension Reduction with Local Linear Embedding 00:04:55
    12. Chapter 12 : Big Data Analysis with R and Hadoop
      1. Preparing the RHadoop Environment 00:05:36
      2. Installing rmr2 00:03:53
      3. Installing rhdfs 00:04:15
      4. Operating HDFS with rhdfs 00:05:47
      5. Implementing a Word Count Problem with RHadoop 00:05:27
      6. Comparing the Performance between an R MapReduce Program and a Standard R Program 00:05:03
      7. Testing and Debugging the rmr2 Program 00:03:49
      8. Installing plyrmr 00:03:12
      9. Manipulating Data with plyrmr 00:03:52
      10. Conducting Machine Learning with RHadoop 00:04:39
      11. Configuring RHadoop Clusters on Amazon EMR 00:05:28