You are previewing R for Data Science Cookbook.
O'Reilly logo
R for Data Science Cookbook

Book Description

Over 100 hands-on recipes to effectively solve real-world data problems using the most popular R packages and techniques

About This Book

  • Gain insight into how data scientists collect, process, analyze, and visualize data using some of the most popular R packages

  • Understand how to apply useful data analysis techniques in R for real-world applications

  • An easy-to-follow guide to make the life of data scientist easier with the problems faced while performing data analysis

  • Who This Book Is For

    This book is for those who are already familiar with the basic operation of R, but want to learn how to efficiently and effectively analyze real-world data problems using practical R packages.

    What You Will Learn

  • Get to know the functional characteristics of R language

  • Extract, transform, and load data from heterogeneous sources

  • Understand how easily R can confront probability and statistics problems

  • Get simple R instructions to quickly organize and manipulate large datasets

  • Create professional data visualizations and interactive reports

  • Predict user purchase behavior by adopting a classification approach

  • Implement data mining techniques to discover items that are frequently purchased together

  • Group similar text documents by using various clustering methods

  • In Detail

    This cookbook offers a range of data analysis samples in simple and straightforward R code, providing step-by-step resources and time-saving methods to help you solve data problems efficiently.

    The first section deals with how to create R functions to avoid the unnecessary duplication of code. You will learn how to prepare, process, and perform sophisticated ETL for heterogeneous data sources with R packages. An example of data manipulation is provided, illustrating how to use the “dplyr” and “data.table” packages to efficiently process larger data structures. We also focus on “ggplot2” and show you how to create advanced figures for data exploration.

    In addition, you will learn how to build an interactive report using the “ggvis” package. Later chapters offer insight into time series analysis on financial data, while there is detailed information on the hot topic of machine learning, including data classification, regression, clustering, association rule mining, and dimension reduction.

    By the end of this book, you will understand how to resolve issues and will be able to comfortably offer solutions to problems encountered while performing data analysis.

    Style and approach

    This easy-to-follow guide is full of hands-on examples of data analysis with R. Each topic is fully explained beginning with the core concept, followed by step-by-step practical examples, and concluding with detailed explanations of each concept used.

    Downloading the example code for this book. You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the code file.

    Table of Contents

    1. R for Data Science Cookbook
      1. Table of Contents
      2. R for Data Science Cookbook
      3. Credits
      4. About the Author
      5. About the Reviewer
      6. www.PacktPub.com
        1. eBooks, discount offers, and more
          1. Why subscribe?
      7. Preface
        1. What this book covers
        2. What you need for this book
        3. Who this book is for
        4. Sections
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
          5. See also
        5. Conventions
        6. Reader feedback
        7. Customer support
          1. Downloading the example code
          2. Downloading the color images of this book
          3. Errata
          4. Piracy
          5. Questions
      8. 1. Functions in R
        1. Introduction
        2. Creating R functions
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
        3. Matching arguments
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
        4. Understanding environments
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
        5. Working with lexical scoping
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
        6. Understanding closure
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
        7. Performing lazy evaluation
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
        8. Creating infix operators
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
        9. Using the replacement function
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
        10. Handling errors in a function
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
        11. The debugging function
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
      9. 2. Data Extracting, Transforming, and Loading
        1. Introduction
        2. Downloading open data
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        3. Reading and writing CSV files
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        4. Scanning text files
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        5. Working with Excel files
          1. Getting ready
          2. How to do it…
          3. How it works…
        6. Reading data from databases
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        7. Scraping web data
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        8. Accessing Facebook data
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        9. Working with twitteR
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
      10. 3. Data Preprocessing and Preparation
        1. Introduction
        2. Renaming the data variable
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        3. Converting data types
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        4. Working with the date format
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        5. Adding new records
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        6. Filtering data
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        7. Dropping data
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        8. Merging data
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        9. Sorting data
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        10. Reshaping data
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        11. Detecting missing data
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        12. Imputing missing data
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
      11. 4. Data Manipulation
        1. Introduction
        2. Enhancing a data.frame with a data.table
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        3. Managing data with a data.table
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        4. Performing fast aggregation with a data.table
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        5. Merging large datasets with a data.table
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        6. Subsetting and slicing data with dplyr
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        7. Sampling data with dplyr
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        8. Selecting columns with dplyr
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        9. Chaining operations in dplyr
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        10. Arranging rows with dplyr
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        11. Eliminating duplicated rows with dplyr
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        12. Adding new columns with dplyr
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        13. Summarizing data with dplyr
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        14. Merging data with dplyr
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
      12. 5. Visualizing Data with ggplot2
        1. Introduction
        2. Creating basic plots with ggplot2
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        3. Changing aesthetics mapping
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        4. Introducing geometric objects
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        5. Performing transformations
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        6. Adjusting scales
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. See also
        7. Faceting
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        8. Adjusting themes
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        9. Combining plots
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        10. Creating maps
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
      13. 6. Making Interactive Reports
        1. Introduction
        2. Creating R Markdown reports
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        3. Learning the markdown syntax
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        4. Embedding R code chunks
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        5. Creating interactive graphics with ggvis
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        6. Understanding basic syntax and grammar
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        7. Controlling axes and legends
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        8. Using scales
          1. Getting ready
          2. How to do it …
          3. How it works…
          4. There's more …
        9. Adding interactivity to a ggvis plot
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        10. Creating an R Shiny document
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        11. Publishing an R Shiny report
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
      14. 7. Simulation from Probability Distributions
        1. Introduction
        2. Generating random samples
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        3. Understanding uniform distributions
          1. Getting ready
          2. How to do it…
          3. How it works…
        4. Generating binomial random variates
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        5. Generating Poisson random variates
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        6. Sampling from a normal distribution
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        7. Sampling from a chi-squared distribution
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        8. Understanding Student's t-distribution
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        9. Sampling from a dataset
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        10. Simulating the stochastic process
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
      15. 8. Statistical Inference in R
        1. Introduction
        2. Getting confidence intervals
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        3. Performing Z-tests
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        4. Performing student's T-tests
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        5. Conducting exact binomial tests
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        6. Performing Kolmogorov-Smirnov tests
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        7. Working with the Pearson's chi-squared tests
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        8. Understanding the Wilcoxon Rank Sum and Signed Rank tests
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        9. Conducting one-way ANOVA
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        10. Performing two-way ANOVA
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
      16. 9. Rule and Pattern Mining with R
        1. Introduction
        2. Transforming data into transactions
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        3. Displaying transactions and associations
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        4. Mining associations with the Apriori rule
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        5. Pruning redundant rules
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        6. Visualizing association rules
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. See also
        7. Mining frequent itemsets with Eclat
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        8. Creating transactions with temporal information
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        9. Mining frequent sequential patterns with cSPADE
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. See also
      17. 10. Time Series Mining with R
        1. Introduction
        2. Creating time series data
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        3. Plotting a time series object
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        4. Decomposing time series
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        5. Smoothing time series
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        6. Forecasting time series
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        7. Selecting an ARIMA model
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        8. Creating an ARIMA model
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        9. Forecasting with an ARIMA model
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        10. Predicting stock prices with an ARIMA model
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
      18. 11. Supervised Machine Learning
        1. Introduction
        2. Fitting a linear regression model with lm
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        3. Summarizing linear model fits
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        4. Using linear regression to predict unknown values
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        5. Measuring the performance of the regression model
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        6. Performing a multiple regression analysis
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        7. Selecting the best-fitted regression model with stepwise regression
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        8. Applying the Gaussian model for generalized linear regression
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. See also
        9. Performing a logistic regression analysis
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. See also
        10. Building a classification model with recursive partitioning trees
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. See also
        11. Visualizing a recursive partitioning tree
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. See also
        12. Measuring model performance with a confusion matrix
          1. Getting ready
          2. How to do it…
          3. How it works…
        13. Measuring prediction performance using ROCR
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. See also…
      19. 12. Unsupervised Machine Learning
        1. Introduction
        2. Clustering data with hierarchical clustering
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        3. Cutting tree into clusters
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        4. Clustering data with the k-means method
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        5. Clustering data with the density-based method
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. See also
        6. Extracting silhouette information from clustering
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. See also
        7. Comparing clustering methods
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        8. Recognizing digits using the density-based clustering method
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. See also
        9. Grouping similar text documents with k-means clustering methods
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. See also
        10. Performing dimension reduction with Principal Component Analysis (PCA)
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        11. Determining the number of principal components using a scree plot
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        12. Determining the number of principal components using the Kaiser method
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. See also
        13. Visualizing multivariate data using a biplot
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. See also
      20. Index