You are previewing Beginning R: An Introduction to Statistical Programming, Second Edition.
O'Reilly logo
Beginning R: An Introduction to Statistical Programming, Second Edition

Book Description

Beginning R, Second Edition is a hands-on book showing how to use the R language, write and save R scripts, read in data files, and write custom statistical functions as well as use built in functions. This book shows the use of R in specific cases such as one-way ANOVA analysis, linear and logistic regression, data visualization, parallel processing, bootstrapping, and more. It takes a hands-on, example-based approach incorporating best practices with clear explanations of the statistics being done. It has been completely re-written since the first edition to make use of the latest packages and features in R version 3.

R is a powerful open-source language and programming environment for statistics and has become the de facto standard for doing, teaching, and learning computational statistics. R is both an object-oriented language and a functional language that is easy to learn, easy to use, and completely free. A large community of dedicated R users and programmers provides an excellent source of R code, functions, and data sets, with a constantly evolving ecosystem of packages providing new functionality for data analysis. R has also become popular in commercial use at companies such as Microsoft, Google, and Oracle. Your investment in learning R is sure to pay off in the long term as R continues to grow into the go to language for data analysis and research.

Table of Contents

  1. Cover
  2. Title
  3. Copyright
  4. Dedication
  5. Contents at a Glance
  6. Contents
  7. About the Author
  8. In Memoriam
  9. About the Technical Reviewer
  10. Acknowledgments
  11. Introduction
  12. Chapter 1 : Getting Star?ted
    1. 1.1 What is R, Anyway?
    2. 1.2 A First R Session
    3. 1.3 Your Second R Session
      1. 1.3.1 Working with Indexes
      2. 1.3.2 Representing Missing Data in R
      3. 1.3.3 Vectors and Vectorization in R
      4. 1.3.4 A Brief Introduction to Matrices
      5. 1.3.5 More on Lists
      6. 1.3.6 A Quick Introduction to Data Frames
  13. Chapter 2 : Dealing with Dates, Strings, and Data Frames
    1. 2.1 Working with Dates and Times
    2. 2.2 Working with Strings
    3. 2.3 Working with Data Frames in the Real World
      1. 2.3.1 Finding and Subsetting Data
    4. 2.4 Manipulating Data Structures
    5. 2.5 The Hard Work of Working with Larger Datasets
  14. Chapter 3 : Input and Output
    1. 3.1 R Input
      1. 3.1.1 The R Editor
      2. 3.1.2 The R Data Editor
      3. 3.1.3 Other Ways to Get Data Into R
      4. 3.1.4 Reading Data from a File
      5. 3.1.5 Getting Data from the Web
    2. 3.2 R Output
      1. 3.2.1 Saving Output to a File
  15. Chapter 4 : Control Structures
    1. 4.1 Using Logic
    2. 4.2 Flow Control
      1. 4.2.1 Explicit Looping
      2. 4.2.2 Implicit Looping
    3. 4.3 If, If-Else, and ifelse() Statements
  16. Chapter 5 : Functional Programming
    1. 5.1 Scoping Rules
    2. 5.2 Reserved Names and Syntactically Correct Names
    3. 5.3 Functions and Arguments
    4. 5.4 Some Example Functions
      1. 5.4.1 Guess the Number
      2. 5.4.2 A Function with Arguments
    5. 5.5 Classes and Methods
      1. 5.5.1 S3 Class and Method Example
      2. 5.5.2 S3 Methods for Existing Classes
  17. Chapter 6 : Probability Distributions
    1. 6.1 Discrete Probability Distributions
    2. 6.2 The Binomial Distribution
      1. 6.2.1 The Poisson Distribution
      2. 6.2.2 Some Other Discrete Distributions
    3. 6.3 Continuous Probability Distributions
      1. 6.3.1 The Normal Distribution
      2. 6.3.2 The t Distribution
      3. 6.3.3 The F distribution
      4. 6.3.4 The Chi-Square Distribution
    4. References
  18. Chapter 7 : Working with Tables
    1. 7.1 Working with One-Way Tables
    2. 7.2 Working with Two-Way Tables
  19. Chapter 8 : Descriptive Statistics and Exploratory Data Analysis
    1. 8.1 Central Tendency
      1. 8.1.1 The Mean
      2. 8.1.2 The Median
      3. 8.1.3 The Mode
    2. 8.2 Variability
      1. 8.2.1 The Range
      2. 8.2.2 The Variance and Standard Deviation
    3. 8.3 Boxplots and Stem-and-Leaf Displays
    4. 8.4 Using the fBasics Package for Summary Statistics
    5. References
  20. Chapter 9 : Working with Graphics
    1. 9.1 Creating Effective Graphics
    2. 9.2 Graphing Nominal and Ordinal Data
    3. 9.3 Graphing Scale Data
      1. 9.3.1 Boxplots Revisited
      2. 9.3.2 Histograms and Dotplots
      3. 9.3.3 Frequency Polygons and Smoothed Density Plots
      4. 9.3.4 Graphing Bivariate Data
    4. References
  21. Chapter 10 : Traditional Statistical Methods
    1. 10.1 Estimation and Confidence Intervals
      1. 10.1.1 Confidence Intervals for Means
      2. 10.1.2 Confidence Intervals for Proportions
      3. 10.1.3 Confidence Intervals for the Variance
    2. 10.2 Hypothesis Tests with One Sample
    3. 10.3 Hypothesis Tests with Two Samples
    4. References
  22. Chapter 11 : Modern Statistical Methods
    1. 11.1 The Need for Modern Statistical Methods
    2. 11.2 A Modern Alternative to the Traditional t Test
    3. 11.3 Bootstrapping
    4. 11.4 Permutation Tests
    5. References
  23. Chapter 12 : Analysis of Variance
    1. 12.1 Some Brief Background
    2. 12.2 One-Way ANOVA
    3. 12.3 Two-Way ANOVA
      1. 12.3.1 Repeated-Measures ANOVA
      2. > results <- aov ( fitness ~ time + Error (id / time ), data = repeated)
      3. 12.3.2 Mixed-Model ANOVA
    4. References
  24. Chapter 13 : Correlation and Regression
    1. 13.1 Covariance and Correlation
    2. 13.2 Linear Regression: Bivariate Case
    3. 13.3 An Extended Regression Example: Stock Screener
      1. 13.3.1 Quadratic Model: Stock Screener
      2. 13.3.2 A Note on Time Series
    4. 13.4 Confidence and Prediction Intervals
    5. References
  25. Chapter 14 : Multiple Regression
    1. 14.1 The Conceptual Statistics of Multiple Regression
    2. 14.2 GSS Multiple Regression Example
      1. 14.2.1 Exploratory Data Analysis
      2. 14.2.2 Linear Model (the First)
      3. 14.2.3 Adding the Next Predictor
      4. 14.2.4 Adding More Predictors
      5. 14.2.5 Presenting Results
    3. 14.3 Final Thoughts
    4. References
  26. Chapter 15 : Logistic Regression
    1. 15.1 The Mathematics of Logistic Regression
    2. 15.2 Generalized Linear Models
    3. 15.3 An Example of Logistic Regression
      1. 15.3.1 What If We Tried a Linear Model on Age?
      2. 15.3.2 Seeing If Age Might Be Relevant with Chi Square
      3. 15.3.3 Fitting a Logistic Regression Model
      4. 15.3.4 The Mathematics of Linear Scaling of Data
      5. 15.3.5 Logit Model with Rescaled Predictor
      6. 15.3.6 Multivariate Logistic Regression
    4. 15.4 Ordered Logistic Regression
      1. 15.4.1 Parallel Ordered Logistic Regression
      2. 15.4.2 Non-Parallel Ordered Logistic Regression
    5. 15.5 Multinomial Regression
    6. References
  27. Chapter 16 : Modern Statistical Methods II
    1. 16.1 Philosophy of Parameters
    2. 16.2 Nonparametric Tests
      1. 16.2.1 Wilcoxon-Signed-Rank Test
      2. 16.2.2 Spearman’s Rho
      3. 16.2.3 Kruskal-Wallis Test
      4. 16.2.4 One-Way Test
    3. 16.3 Bootstrapping
      1. 16.3.1 Examples from mtcars
      2. 16.3.2 Bootstrapping Confidence Intervals
      3. 16.3.3 Examples from GSS
    4. 16.4 Final Thought
    5. References
  28. Chapter 17 : Data Visualization Cookbook
    1. 17.1 Required Packages
    2. 17.2 Univariate Plots
    3. 17.3 Customizing and Polishing Plots
    4. 17.4 Multivariate Plots
    5. 17.5 Multiple Plots
    6. 17.6 Three-Dimensional Graphs
    7. References
  29. Chapter 18 : High-Performance Computing
    1. 18.1 Data
    2. 18.2 Parallel Processing
      1. 18.2.1 Other Parallel Processing Approaches
    3. References
  30. Chapter 19 : Text Mining
    1. 19.1 Installing Needed Packages and Software
      1. 19.1.1 Java
      2. 19.1.2 PDF Software
      3. 19.1.3 R Packages
      4. 19.1.4 Some Needed Files
    2. 19.2 Text Mining
      1. 19.2.1 Word Clouds and Transformations
      2. 19.2.2 PDF Text Input
      3. 19.2.3 Google News Input
      4. 19.2.4 Topic Models
    3. 19.3 Final Thoughts
    4. References
  31. Index