You are previewing Machine Learning with R Cookbook.
O'Reilly logo
Machine Learning with R Cookbook

Book Description

Explore over 110 recipes to analyze data and build predictive models with the simple and easy-to-use R code

In Detail

The R language is a powerful open source functional programming language. At its core, R is a statistical programming language that provides impressive tools to analyze data and create high-level graphics.

This book covers the basics of R by setting up a user-friendly programming environment and performing data ETL in R. Data exploration examples are provided that demonstrate how powerful data visualization and machine learning is in discovering hidden relationships. You will then dive into important machine learning topics, including data classification, regression, clustering, association rule mining, and dimension reduction.

What You Will Learn

  • Create and inspect the transaction dataset, performing association analysis with the Apriori algorithm

  • Visualize patterns and associations using a range of graphs and find frequent itemsets using the Eclat algorithm

  • Compare differences between each regression method to discover how they solve problems

  • Predict possible churn users with the classification approach

  • Implement the clustering method to segment customer data

  • Compress images with the dimension reduction method

  • Incorporate R and Hadoop to solve machine learning problems on big data

  • Downloading the example code for this book. You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the files e-mailed directly to you.

    Table of Contents

    1. Machine Learning with R Cookbook
      1. Table of Contents
      2. Machine Learning with R Cookbook
      3. Credits
      4. About the Author
      5. About the Reviewers
      6. www.PacktPub.com
        1. Support files, eBooks, discount offers, and more
          1. Why subscribe?
          2. Free access for Packt account holders
      7. Preface
        1. What this book covers
        2. What you need for this book
        3. Who this book is for
        4. Sections
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
          5. See also
        5. Conventions
        6. Reader feedback
        7. Customer support
          1. Downloading the example code
          2. Errata
          3. Piracy
          4. Questions
      8. 1. Practical Machine Learning with R
        1. Introduction
        2. Downloading and installing R
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        3. Downloading and installing RStudio
          1. Getting ready
          2. How to do it...
          3. How it works
          4. See also
        4. Installing and loading packages
          1. Getting ready
          2. How to do it...
          3. How it works
          4. See also
        5. Reading and writing data
          1. Getting ready
          2. How to do it...
          3. How it works
          4. See also
        6. Using R to manipulate data
          1. Getting ready
          2. How to do it...
          3. How it works
          4. There's more...
        7. Applying basic statistics
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
        8. Visualizing data
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        9. Getting a dataset for machine learning
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
      9. 2. Data Exploration with RMS Titanic
        1. Introduction
        2. Reading a Titanic dataset from a CSV file
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
        3. Converting types on character variables
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
        4. Detecting missing values
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
        5. Imputing missing values
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
        6. Exploring and visualizing data
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
          5. See also
        7. Predicting passenger survival with a decision tree
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
        8. Validating the power of prediction with a confusion matrix
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
        9. Assessing performance with the ROC curve
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
      10. 3. R and Statistics
        1. Introduction
        2. Understanding data sampling in R
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        3. Operating a probability distribution in R
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
        4. Working with univariate descriptive statistics in R
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
        5. Performing correlations and multivariate analysis
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        6. Operating linear regression and multivariate analysis
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        7. Conducting an exact binomial test
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        8. Performing student's t-test
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        9. Performing the Kolmogorov-Smirnov test
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        10. Understanding the Wilcoxon Rank Sum and Signed Rank test
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        11. Working with Pearson's Chi-squared test
          1. Getting ready
          2. How to do it
          3. How it works...
          4. There's more...
        12. Conducting a one-way ANOVA
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
        13. Performing a two-way ANOVA
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
      11. 4. Understanding Regression Analysis
        1. Introduction
        2. Fitting a linear regression model with lm
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
        3. Summarizing linear model fits
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        4. Using linear regression to predict unknown values
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        5. Generating a diagnostic plot of a fitted model
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
        6. Fitting a polynomial regression model with lm
          1. Getting ready
          2. How to do it...
          3. How it works
          4. There's more...
        7. Fitting a robust linear regression model with rlm
          1. Getting ready
          2. How to do it...
          3. How it works
          4. There's more...
        8. Studying a case of linear regression on SLID data
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        9. Applying the Gaussian model for generalized linear regression
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        10. Applying the Poisson model for generalized linear regression
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        11. Applying the Binomial model for generalized linear regression
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        12. Fitting a generalized additive model to data
          1. Getting ready
          2. How to do it...
          3. How it works
          4. See also
        13. Visualizing a generalized additive model
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
        14. Diagnosing a generalized additive model
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
      12. 5. Classification (I) – Tree, Lazy, and Probabilistic
        1. Introduction
        2. Preparing the training and testing datasets
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
        3. Building a classification model with recursive partitioning trees
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        4. Visualizing a recursive partitioning tree
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        5. Measuring the prediction performance of a recursive partitioning tree
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        6. Pruning a recursive partitioning tree
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        7. Building a classification model with a conditional inference tree
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        8. Visualizing a conditional inference tree
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        9. Measuring the prediction performance of a conditional inference tree
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        10. Classifying data with the k-nearest neighbor classifier
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        11. Classifying data with logistic regression
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        12. Classifying data with the Naïve Bayes classifier
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
      13. 6. Classification (II) – Neural Network and SVM
        1. Introduction
        2. Classifying data with a support vector machine
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        3. Choosing the cost of a support vector machine
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        4. Visualizing an SVM fit
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        5. Predicting labels based on a model trained by a support vector machine
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
        6. Tuning a support vector machine
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        7. Training a neural network with neuralnet
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        8. Visualizing a neural network trained by neuralnet
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        9. Predicting labels based on a model trained by neuralnet
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        10. Training a neural network with nnet
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        11. Predicting labels based on a model trained by nnet
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
      14. 7. Model Evaluation
        1. Introduction
        2. Estimating model performance with k-fold cross-validation
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
        3. Performing cross-validation with the e1071 package
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        4. Performing cross-validation with the caret package
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        5. Ranking the variable importance with the caret package
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
        6. Ranking the variable importance with the rminer package
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        7. Finding highly correlated features with the caret package
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        8. Selecting features using the caret package
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        9. Measuring the performance of the regression model
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more…
        10. Measuring prediction performance with a confusion matrix
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        11. Measuring prediction performance using ROCR
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        12. Comparing an ROC curve using the caret package
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        13. Measuring performance differences between models with the caret package
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
      15. 8. Ensemble Learning
        1. Introduction
        2. Classifying data with the bagging method
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
        3. Performing cross-validation with the bagging method
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        4. Classifying data with the boosting method
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
        5. Performing cross-validation with the boosting method
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        6. Classifying data with gradient boosting
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
        7. Calculating the margins of a classifier
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        8. Calculating the error evolution of the ensemble method
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        9. Classifying data with random forest
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
        10. Estimating the prediction errors of different classifiers
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
      16. 9. Clustering
        1. Introduction
        2. Clustering data with hierarchical clustering
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
        3. Cutting trees into clusters
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
        4. Clustering data with the k-means method
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        5. Drawing a bivariate cluster plot
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more
        6. Comparing clustering methods
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        7. Extracting silhouette information from clustering
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        8. Obtaining the optimum number of clusters for k-means
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        9. Clustering data with the density-based method
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        10. Clustering data with the model-based method
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        11. Visualizing a dissimilarity matrix
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
        12. Validating clusters externally
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
      17. 10. Association Analysis and Sequence Mining
        1. Introduction
        2. Transforming data into transactions
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        3. Displaying transactions and associations
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        4. Mining associations with the Apriori rule
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        5. Pruning redundant rules
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        6. Visualizing association rules
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        7. Mining frequent itemsets with Eclat
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        8. Creating transactions with temporal information
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        9. Mining frequent sequential patterns with cSPADE
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
      18. 11. Dimension Reduction
        1. Introduction
        2. Performing feature selection with FSelector
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        3. Performing dimension reduction with PCA
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
        4. Determining the number of principal components using the scree test
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
        5. Determining the number of principal components using the Kaiser method
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        6. Visualizing multivariate data using biplot
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
        7. Performing dimension reduction with MDS
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
        8. Reducing dimensions with SVD
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        9. Compressing images with SVD
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        10. Performing nonlinear dimension reduction with ISOMAP
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
        11. Performing nonlinear dimension reduction with Local Linear Embedding
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
      19. 12. Big Data Analysis (R and Hadoop)
        1. Introduction
        2. Preparing the RHadoop environment
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        3. Installing rmr2
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        4. Installing rhdfs
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        5. Operating HDFS with rhdfs
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        6. Implementing a word count problem with RHadoop
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        7. Comparing the performance between an R MapReduce program and a standard R program
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        8. Testing and debugging the rmr2 program
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        9. Installing plyrmr
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        10. Manipulating data with plyrmr
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        11. Conducting machine learning with RHadoop
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        12. Configuring RHadoop clusters on Amazon EMR
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
      20. A. Resources for R and Machine Learning
      21. B. Dataset – Survival of Passengers on the Titanic
      22. Index