You are previewing R Statistical Application Development by Example Beginner's Guide.
O'Reilly logo
R Statistical Application Development by Example Beginner's Guide

Book Description

Learn R Statistical Application Development from scratch in a clear and pedagogical manner

  • A self-learning guide for the user who needs statistical tools for understanding uncertainty in computer science data

  • Essential descriptive statistics, effective data visualization, and efficient model building

  • Every method explained through real data sets enables clarity and confidence for unforeseen scenarios

In Detail

"R Statistical Application Development by Example Beginner’s Guide" explores statistical concepts and the R software, which are well integrated from the word go. This demarcates the separate learning of theory and applications and hence the title begins with “R Statistical …”. Almost every concept has an R code going with it which exemplifies the strength of R and applications. Thus, the reader first understands the data characteristics, descriptive statistics, and the exploratory attitude which gives the first firm footing of data analysis. Statistical inference and the use of simulation which makes use of the computational power complete the technical footing of statistical methods. Regression modeling, linear, logistic, and CART, builds the essential toolkit which helps the reader complete complex problems in the real world.

The reader will begin with a brief understanding of the nature of data and end with modern and advanced statistical models like CART. Every step is taken with DATA and R code.

The data analysis journey begins with exploratory analysis, which is more than simple descriptive data summaries, and then takes the traditional path up to linear regression modeling, and ends with logistic regression, CART, and spatial statistics.

True to the title R Statistical Application Development by Example Beginner’s Guide, the reader will enjoy the examples and R software.

Table of Contents

  1. R Statistical Application Development by Example Beginner's Guide
    1. Table of Contents
    2. R Statistical Application Development by Example Beginner's Guide
    3. Credits
    4. About the Author
    5. About the Reviewers
    6. www.PacktPub.com
      1. Support files, eBooks, discount offers and more
        1. Why Subscribe?
        2. Free Access for Packt account holders
    7. Preface
      1. What this book covers
      2. What you need for this book
      3. Who this book is for
      4. Conventions
      5. Time for action – heading
        1. What just happened?
        2. Pop quiz – heading
        3. Have a go hero – heading
      6. Reader feedback
      7. Customer support
        1. Downloading the example code
        2. Downloading the color images of this book
        3. Errata
        4. Piracy
        5. Questions
    8. 1. Data Characteristics
      1. Questionnaire and its components
        1. Understanding the data characteristics in an R environment
      2. Experiments with uncertainty in computer science
      3. R installation
        1. Using R packages
        2. RSADBE – the book's R package
        3. Discrete distribution
        4. Discrete uniform distribution
        5. Binomial distribution
        6. Hypergeometric distribution
        7. Negative binomial distribution
        8. Poisson distribution
      4. Continuous distribution
        1. Uniform distribution
        2. Exponential distribution
        3. Normal distribution
      5. Summary
    9. 2. Import/Export Data
      1. data.frame and other formats
        1. Constants, vectors, and matrices
      2. Time for action – understanding constants, vectors, and basic arithmetic
        1. What just happened?
      3. Time for action – matrix computations
        1. What just happened?
        2. The list object
      4. Time for action – creating a list object
        1. What just happened?
        2. The data.frame object
      5. Time for action – creating a data.frame object
        1. What just happened?
        2. Have a go hero
        3. The table object
        4. Time for action – creating the Titanic dataset as a table object
          1. What just happened?
          2. Have a go hero
        5. read.csv, read.xls, and the foreign package
        6. Time for action – importing data from external files
          1. What just happened?
          2. What just happened?
          3. Importing data from MySQL
        7. Exporting data/graphs
          1. Exporting R objects
          2. Exporting graphs
        8. Time for action – exporting a graph
          1. What just happened?
        9. Managing an R session
        10. Time for action – session management
          1. What just happened?
          2. Have a go hero
      6. Summary
    10. 3. Data Visualization
      1. Visualization techniques for categorical data
        1. Bar charts
          1. Going through the built-in examples of R
      2. Time for action – bar charts in R
        1. What just happened?
        2. Have a go hero
        3. Dot charts
      3. Time for action – dot charts in R
        1. What just happened?
        2. Spine and mosaic plots
      4. Time for action – the spine plot for the shift and operator data
        1. What just happened?
      5. Time for action – the mosaic plot for the Titanic dataset
        1. What just happened?
        2. Pie charts and the fourfold plot
      6. Visualization techniques for continuous variable data
        1. Boxplot
      7. Time for action – using the boxplot
        1. What just happened?
        2. Histograms
      8. Time for action – understanding the effectiveness of histograms
        1. What just happened?
        2. Scatter plots
      9. Time for action – plot and pairs R functions
        1. What just happened?
        2. Pareto charts
      10. A brief peek at ggplot2
      11. Time for action – qplot
        1. What just happened?
      12. Time for action – ggplot
        1. What just happened?
        2. Have a go hero
      13. Summary
    11. 4. Exploratory Analysis
      1. Essential summary statistics
        1. Percentiles, quantiles, and median
        2. Hinges
        3. The interquartile range
      2. Time for action – the essential summary statistics for "The Wall" dataset
        1. What just happened?
      3. The stem-and-leaf plot
      4. Time for action – the stem function in play
        1. What just happened?
      5. Letter values
      6. Data re-expression
        1. Have a go hero
      7. Bagplot – a bivariate boxplot
      8. Time for action – the bagplot display for a multivariate dataset
        1. What just happened?
      9. The resistant line
      10. Time for action – the resistant line as a first regression model
        1. What just happened?
      11. Smoothing data
      12. Time for action – smoothening the cow temperature data
        1. What just happened?
      13. Median polish
      14. Time for action – the median polish algorithm
        1. What just happened?
        2. Have a go hero
      15. Summary
    12. 5. Statistical Inference
      1. Maximum likelihood estimator
        1. Visualizing the likelihood function
      2. Time for action – visualizing the likelihood function
        1. What just happened?
        2. Finding the maximum likelihood estimator
        3. Using the fitdistr function
      3. Time for action – finding the MLE using mle and fitdistr functions
        1. What just happened?
      4. Confidence intervals
      5. Time for action – confidence intervals
        1. What just happened?
      6. Hypotheses testing
        1. Binomial test
      7. Time for action – testing the probability of success
        1. What just happened?
        2. Tests of proportions and the chi-square test
      8. Time for action – testing proportions
        1. What just happened?
        2. Tests based on normal distribution – one-sample
      9. Time for action – testing one-sample hypotheses
        1. What just happened?
        2. Have a go hero
        3. Tests based on normal distribution – two-sample
      10. Time for action – testing two-sample hypotheses
        1. What just happened?
        2. Have a go hero
      11. Summary
    13. 6. Linear Regression Analysis
      1. The simple linear regression model
        1. What happens to the arbitrary choice of parameters?
      2. Time for action – the arbitrary choice of parameters
        1. What just happened?
        2. Building a simple linear regression model
      3. Time for action – building a simple linear regression model
        1. What just happened?
        2. Have a go hero
        3. ANOVA and the confidence intervals
      4. Time for action – ANOVA and the confidence intervals
        1. What just happened?
        2. Model validation
      5. Time for action – residual plots for model validation
        1. What just happened?
        2. Have a go hero
      6. Multiple linear regression model
        1. Averaging k simple linear regression models or a multiple linear regression model
      7. Time for action – averaging k simple linear regression models
        1. What just happened?
        2. Building a multiple linear regression model
      8. Time for action – building a multiple linear regression model
        1. What just happened?
        2. The ANOVA and confidence intervals for the multiple linear regression model
      9. Time for action – the ANOVA and confidence intervals for the multiple linear regression model
        1. What just happened?
        2. Have a go hero
        3. Useful residual plots
      10. Time for action – residual plots for the multiple linear regression model
        1. What just happened?
      11. Regression diagnostics
        1. Leverage points
        2. Influential points
        3. DFFITS and DFBETAS
      12. The multicollinearity problem
      13. Time for action – addressing the multicollinearity problem for the Gasoline data
        1. What just happened?
      14. Model selection
        1. Stepwise procedures
          1. The backward elimination
          2. The forward selection
        2. Criterion-based procedures
      15. Time for action – model selection using the backward, forward, and AIC criteria
        1. What just happened?
        2. Have a go hero
      16. Summary
    14. 7. The Logistic Regression Model
      1. The binary regression problem
      2. Time for action – limitations of linear regression models
        1. What just happened?
      3. Probit regression model
      4. Time for action – understanding the constants
        1. What just happened?
      5. Logistic regression model
      6. Time for action – fitting the logistic regression model
        1. What just happened?
        2. Hosmer-Lemeshow goodness-of-fit test statistic
      7. Time for action – The Hosmer-Lemeshow goodness-of-fit statistic
        1. What just happened?
      8. Model validation and diagnostics
        1. Residual plots for the GLM
      9. Time for action – residual plots for the logistic regression model
        1. What just happened?
        2. Have a go hero
        3. Influence and leverage for the GLM
      10. Time for action – diagnostics for the logistic regression
        1. What just happened?
        2. Have a go hero
      11. Receiving operator curves
      12. Time for action – ROC construction
        1. What just happened?
      13. Logistic regression for the German credit screening dataset
      14. Time for action – logistic regression for the German credit dataset
        1. What just happened?
        2. Have a go hero
      15. Summary
    15. 8. Regression Models with Regularization
      1. The overfitting problem
      2. Time for action – understanding overfitting
        1. What just happened?
        2. Have a go hero
      3. Regression spline
        1. Basis functions
        2. Piecewise linear regression model
      4. Time for action – fitting piecewise linear regression models
        1. What just happened?
        2. Natural cubic splines and the general B-splines
      5. Time for action – fitting the spline regression models
        1. What just happened?
      6. Ridge regression for linear models
      7. Time for action – ridge regression for the linear regression model
        1. What just happened?
      8. Ridge regression for logistic regression models
      9. Time for action – ridge regression for the logistic regression model
        1. What just happened?
      10. Another look at model assessment
      11. Time for action – selecting lambda iteratively and other topics
        1. What just happened?
        2. Pop quiz
      12. Summary
    16. 9. Classification and Regression Trees
      1. Recursive partitions
      2. Time for action – partitioning the display plot
        1. What just happened?
        2. Splitting the data
        3. The first tree
      3. Time for action – building our first tree
        1. What just happened?
      4. The construction of a regression tree
      5. Time for action – the construction of a regression tree
        1. What just happened?
      6. The construction of a classification tree
      7. Time for action – the construction of a classification tree
        1. What just happened?
      8. Classification tree for the German credit data
      9. Time for action – the construction of a classification tree
        1. What just happened?
        2. Have a go hero
      10. Pruning and other finer aspects of a tree
      11. Time for action – pruning a classification tree
        1. What just happened?
        2. Pop quiz
      12. Summary
    17. 10. CART and Beyond
      1. Improving CART
      2. Time for action – cross-validation predictions
        1. What just happened?
      3. Bagging
        1. The bootstrap
      4. Time for action – understanding the bootstrap technique
        1. What just happened?
        2. The bagging algorithm
      5. Time for action – the bagging algorithm
        1. What Just Happened?
      6. Random forests
      7. Time for action – random forests for the German credit data
        1. What just happened?
      8. The consolidation
      9. Time for action – random forests for the low birth weight data
        1. What just happened?
      10. Summary
    18. A. References
    19. Index