You are previewing R Data Analysis Cookbook.
O'Reilly logo
R Data Analysis Cookbook

Book Description

Over 80 recipes to help you breeze through your data analysis projects using R

In Detail

Data analytics with R has emerged as a very important focus for organizations of all kinds. R enables even those with only an intuitive grasp of the underlying concepts, without a deep mathematical background, to unleash powerful and detailed examinations of their data.

This book empowers you by showing you ways to use R to generate professional analysis reports. It provides examples for various important analysis and machine-learning tasks that you can try out with associated and readily available data. The book also teaches you to quickly adapt the example code for your own needs and save yourself the time needed to construct code from scratch.

What You Will Learn

  • perform advanced analyses and create informative and professional charts

  • become proficient in acquiring data from many sources

  • apply supervised and unsupervised data mining techniques

  • use R's features to present analyses professionally

  • Get data into your R environment and prepare it for analysis

  • Perform exploratory data analyses and generate meaningful visualizations of the data

  • Apply several machine-learning techniques for classification and regression

  • Get your hands around large data sets with the help of reduction techniques

  • Extract patterns from time-series data and produce forecasts based on them

  • Learn how to extract actionable information from social network data

  • Implement geospatial analysis

  • Present your analysis convincingly through reports and build an infrastructure to enable others to play with your data

  • Downloading the example code for this book. You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the files e-mailed directly to you.

    Table of Contents

    1. R Data Analysis Cookbook
      1. Table of Contents
      2. R Data Analysis Cookbook
      3. Credits
      4. About the Authors
      5. About the Reviewers
      6. www.PacktPub.com
        1. Support files, eBooks, discount offers, and more
          1. Why Subscribe?
          2. Free Access for Packt account holders
      7. Preface
        1. What this book covers
        2. What you need for this book
        3. Who this book is for
        4. Sections
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
          5. See also
        5. Conventions
        6. Reader feedback
        7. Customer support
          1. Downloading the example code and data
          2. About the data files used in this book
          3. Downloading the color images of this book
          4. Errata
          5. Piracy
          6. Questions
      8. 1. Acquire and Prepare the Ingredients – Your Data
        1. Introduction
        2. Reading data from CSV files
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
            1. Handling different column delimiters
            2. Handling column headers/variable names
            3. Handling missing values
            4. Reading strings as characters and not as factors
            5. Reading data directly from a website
        3. Reading XML data
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
            1. Extracting HTML table data from a web page
            2. Extracting a single HTML table from a web page
        4. Reading JSON data
          1. Getting ready
          2. How to do it...
          3. How it works...
        5. Reading data from fixed-width formatted files
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
            1. Files with headers
            2. Excluding columns from data
        6. Reading data from R files and R libraries
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
            1. To save all objects in a session
            2. To selectively save objects in a session
            3. Attaching/detaching R data files to an environment
            4. Listing all datasets in loaded packages
        7. Removing cases with missing values
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
            1. Eliminating cases with NA for selected variables
            2. Finding cases that have no missing values
            3. Converting specific values to NA
            4. Excluding NA values from computations
        8. Replacing missing values with the mean
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
            1. Imputing random values sampled from nonmissing values
        9. Removing duplicate cases
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
            1. Identifying duplicates (without deleting them)
        10. Rescaling a variable to [0,1]
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
            1. Rescaling many variables at once
          5. See also…
        11. Normalizing or standardizing data in a data frame
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
            1. Standardizing several variables simultaneously
          5. See also…
        12. Binning numerical data
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
            1. Creating a specified number of intervals automatically
        13. Creating dummies for categorical variables
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
            1. Choosing which variables to create dummies for
      9. 2. What's in There? – Exploratory Data Analysis
        1. Introduction
        2. Creating standard data summaries
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
            1. Using the str() function for an overview of a data frame
            2. Computing the summary for a single variable
            3. Finding the mean and standard deviation
        3. Extracting a subset of a dataset
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
            1. Excluding columns
            2. Selecting based on multiple values
            3. Selecting using logical vector
        4. Splitting a dataset
          1. Getting ready
          2. How to do it...
          3. How it works...
        5. Creating random data partitions
          1. Getting ready
          2. How to do it…
            1. Case 1 – numerical target variable and two partitions
            2. Case 2 – numerical target variable and three partitions
            3. Case 3 – categorical target variable and two partitions
            4. Case 4 – categorical target variable and three partitions
          3. How it works...
          4. There's more...
            1. Using a convenience function for partitioning
            2. Sampling from a set of values
        6. Generating standard plots such as histograms, boxplots, and scatterplots
          1. Getting ready
          2. How to do it...
            1. Histograms
            2. Boxplots
            3. Scatterplots
            4. Scatterplot matrices
          3. How it works...
            1. Histograms
            2. Boxplots
          4. There's more...
            1. Overlay a density plot on a histogram
            2. Overlay a regression line on a scatterplot
            3. Color specific points on a scatterplot
        7. Generating multiple plots on a grid
          1. Getting ready
          2. How to do it...
          3. How it works...
            1. Graphics parameters
          4. See also…
        8. Selecting a graphics device
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also…
        9. Creating plots with the lattice package
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
            1. Adding flair to your graphs
          5. See also…
        10. Creating plots with the ggplot2 package
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
            1. Graph using qplot
            2. Condition plots on continuous numeric variables
          5. See also…
        11. Creating charts that facilitate comparisons
          1. Getting ready
          2. How to do it...
            1. Using base plotting system
            2. Using ggplot2
          3. How it works...
          4. There's more...
            1. Creating boxplots with ggplot2
          5. See also…
        12. Creating charts that help visualize a possible causality
          1. Getting ready
          2. How to do it...
          3. See also…
        13. Creating multivariate plots
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also…
      10. 3. Where Does It Belong? – Classification
        1. Introduction
        2. Generating error/classification-confusion matrices
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
            1. Visualizing the error/classification confusion matrix
            2. Comparing the model's performance for different classes
        3. Generating ROC charts
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more…
            1. Using arbitrary class labels
        4. Building, plotting, and evaluating – classification trees
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
            1. Computing raw probabilities
            2. Create the ROC Chart
          5. See also
        5. Using random forest models for classification
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
            1. Computing raw probabilities
            2. Generating the ROC chart
            3. Specifying cutoffs for classification
          5. See also...
        6. Classifying using Support Vector Machine
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
            1. Controlling scaling of variables
            2. Determining the type of SVM model
            3. Assigning weights to the classes
          5. See also...
        7. Classifying using the Naïve Bayes approach
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also...
        8. Classifying using the KNN approach
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
            1. Automating the process of running KNN for many k values
            2. Using KNN to compute raw probabilities instead of classifications
        9. Using neural networks for classification
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
            1. Exercising greater control over nnet
            2. Generating raw probabilities
        10. Classifying using linear discriminant function analysis
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
            1. Using the formula interface for lda
          5. See also ...
        11. Classifying using logistic regression
          1. Getting ready
          2. How to do it...
          3. How it works...
        12. Using AdaBoost to combine classification tree models
          1. Getting ready
          2. How to do it...
          3. How it works...
      11. 4. Give Me a Number – Regression
        1. Introduction
        2. Computing the root mean squared error
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
            1. Using a convenience function to compute the RMS error
        3. Building KNN models for regression
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
            1. Running KNN with cross-validation in place of validation partition
            2. Using a convenience function to run KNN
            3. Using a convenience function to run KNN for multiple k values
          5. See also...
        4. Performing linear regression
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
            1. Forcing lm to use a specific factor level as the reference
            2. Using other options in the formula expression for linear models
          5. See also...
        5. Performing variable selection in linear regression
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also...
        6. Building regression trees
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more…
            1. Generating regression trees for data with categorical predictors
          5. See also...
        7. Building random forest models for regression
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
            1. Controlling forest generation
          5. See also...
        8. Using neural networks for regression
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also...
        9. Performing k-fold cross-validation
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also...
        10. Performing leave-one-out-cross-validation to limit overfitting
          1. How to do it...
          2. How it works...
          3. See also...
      12. 5. Can You Simplify That? – Data Reduction Techniques
        1. Introduction
        2. Performing cluster analysis using K-means clustering
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
            1. Use a convenience function to choose a value for K
          5. See also...
        3. Performing cluster analysis using hierarchical clustering
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also...
        4. Reducing dimensionality with principal component analysis
          1. Getting ready
          2. How to do it...
          3. How it works...
      13. 6. Lessons from History – Time Series Analysis
        1. Introduction
        2. Creating and examining date objects
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also...
        3. Operating on date objects
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also...
        4. Performing preliminary analyses on time series data
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also...
        5. Using time series objects
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also...
        6. Decomposing time series
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also...
        7. Filtering time series data
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also...
        8. Smoothing and forecasting using the Holt-Winters method
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also...
        9. Building an automated ARIMA model
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also...
      14. 7. It's All About Your Connections – Social Network Analysis
        1. Introduction
        2. Downloading social network data using public APIs
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also...
        3. Creating adjacency matrices and edge lists
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also...
        4. Plotting social network data
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
            1. Specifying plotting preferences
            2. Plotting directed graphs
            3. Creating a graph object with weights
            4. Extracting the network as an adjacency matrix from the graph object
            5. Extracting an adjacency matrix with weights
            6. Extracting edge list from graph object
            7. Creating bipartite network graph
            8. Generating projections of a bipartite network
          5. See also...
        5. Computing important network metrics
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
            1. Getting edge sequences
            2. Getting immediate and distant neighbors
            3. Adding vertices or nodes
            4. Adding edges
            5. Deleting isolates from a graph
            6. Creating subgraphs
      15. 8. Put Your Best Foot Forward – Document and Present Your Analysis
        1. Introduction
        2. Generating reports of your data analysis with R Markdown and knitr
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
            1. Using the render function
            2. Adding output options
        3. Creating interactive web applications with shiny
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
            1. Adding images
            2. Adding HTML
            3. Adding tab sets
            4. Adding a dynamic UI
            5. Creating single file web application
        4. Creating PDF presentations of your analysis with R Presentation
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
            1. Using hyperlinks
            2. Controlling the display
            3. Enhancing the look of the presentation
      16. 9. Work Smarter, Not Harder – Efficient and Elegant R Code
        1. Introduction
        2. Exploiting vectorized operations
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
        3. Processing entire rows or columns using the apply function
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
            1. Using apply on a three-dimensional array
        4. Applying a function to all elements of a collection with lapply and sapply
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
            1. Dynamic output
            2. One caution
        5. Applying functions to subsets of a vector
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
            1. Applying a function on groups from a data frame
        6. Using the split-apply-combine strategy with plyr
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
            1. Adding a new column using transform
            2. Using summarize along with the plyr function
            3. Concatenating the list of data frames into a big data frame
        7. Slicing, dicing, and combining data with data tables
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
            1. Adding multiple aggregated columns
            2. Counting groups
            3. Deleting a column
            4. Joining data tables
            5. Using symbols
      17. 10. Where in the World? – Geospatial Analysis
        1. Introduction
        2. Downloading and plotting a Google map of an area
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
            1. Saving the downloaded map as an image file
            2. Getting a satellite image
        3. Overlaying data on the downloaded Google map
          1. Getting ready
          2. How to do it...
          3. How it works...
        4. Importing ESRI shape files into R
          1. Getting ready
          2. How to do it...
          3. How it works...
        5. Using the sp package to plot geographic data
          1. Getting ready
          2. How to do it...
          3. How it works...
        6. Getting maps from the maps package
          1. Getting ready
          2. How to do it...
          3. How it works...
        7. Creating spatial data frames from regular data frames containing spatial and other data
          1. Getting ready
          2. How to do it...
          3. How it works...
        8. Creating spatial data frames by combining regular data frames with spatial objects
          1. Getting ready
          2. How to do it...
          3. How it works...
        9. Adding variables to an existing spatial data frame
          1. Getting ready
          2. How to do it...
          3. How it works...
      18. 11. Playing Nice – Connecting to Other Systems
        1. Introduction
        2. Using Java objects in R
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
            1. Checking JVM properties
            2. Displaying available methods
        3. Using JRI to call R functions from Java
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
        4. Using Rserve to call R functions from Java
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
            1. Retrieving an array from R
        5. Executing R scripts from Java
          1. Getting ready
          2. How to do it...
          3. How it works...
        6. Using the xlsx package to connect to Excel
          1. Getting ready
          2. How to do it...
          3. How it works...
        7. Reading data from relational databases – MySQL
          1. Getting ready
          2. How to do it...
            1. Using RODBC
            2. Using RMySQL
            3. Using RJDBC
          3. How it works...
            1. Using RODBC
            2. Using RMySQL
            3. Using RJDBC
          4. There's more...
            1. Fetching all rows
            2. When the SQL query is long
        8. Reading data from NoSQL databases – MongoDB
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
            1. Validating your JSON
      19. Index