You are previewing Statistical Computing with R.
O'Reilly logo
Statistical Computing with R

Book Description

Computational statistics and statistical computing are two areas that employ computational, graphical, and numerical approaches to solve statistical problems, making the versatile R language an ideal computing environment for these fields. One of the first books on these topics to feature R, Statistical Computing with R covers the traditional core material of computational statistics, with an emphasis on using the R language via an examples-based approach. Suitable for an introductory course in computational statistics or for self-study, it includes R code for all examples and R notes to help explain the R programming concepts.

After an overview of computational statistics and an introduction to the R computing environment, the book reviews some basic concepts in probability and classical statistical inference. Each subsequent chapter explores a specific topic in computational statistics. These chapters cover the simulation of random variables from probability distributions, the visualization of multivariate data, Monte Carlo integration and variance reduction methods, Monte Carlo methods in inference, bootstrap and jackknife, permutation tests, Markov chain Monte Carlo (MCMC) methods, and density estimation. The final chapter presents a selection of examples that illustrate the application of numerical methods using R functions.

Focusing on implementation rather than theory, this text serves as a balanced, accessible introduction to computational statistics and statistical computing.

Table of Contents

  1. Preliminaries
  2. Preface
    1. Acknowledgements
  3. Chapter 1 Introduction
    1. 1.1 Computational Statistics and Statistical Computing
    2. 1.2 The R Environment
    3. 1.3 Getting Started with R
        1. Syntax
    4. 1.4 Using the R Online Help System
    5. 1.5 Functions
    6. 1.6 Arrays, Data Frames, and Lists
        1. Data Frames
        2. Arrays and Matrices
        3. Lists
    7. 1.7 Workspace and Files
        1. The Working Directory
        2. Reading Data from External Files
    8. 1.8 Using Scripts
    9. 1.9 Using Packages
    10. 1.10 Graphics
        1. Colors, plotting symbols, and line types
      1. Table 1.1
      2. Table 1.2
      3. Table 1.3
      4. Table 1.4
  4. Chapter 2 Probability and Statistics Review
    1. 2.1 Random Variables and Probability
        1. Distribution and Density Functions
        2. Expectation, Variance, and Moments
        3. Conditional Probability and Independence
        4. Independence
        5. Properties of Expected Value and Variance
    2. 2.2 Some Discrete Distributions
        1. Binomial and Multinomial Distribution
        2. Geometric Distribution
        3. Alternative formulation of Geometric distribution
        4. Negative Binomial Distribution
        5. Poisson Distribution
        6. Examples
    3. 2.3 Some Continuous Distributions
        1. Normal Distribution
        2. Gamma and Exponential Distributions
        3. Chisquare and t
        4. Beta and Uniform Distributions
        5. Lognormal Distribution
        6. Examples
    4. 2.4 Multivariate Normal Distribution
        1. The bivariate normal distribution
        2. The multivariate normal distribution
    5. 2.5 Limit Theorems
        1. Laws of Large Numbers
        2. Central Limit Theorem
    6. 2.6 Statistics
        1. The empirical distribution function
        2. Bias and Mean Squared Error
        3. Method of Moments
        4. The Likelihood Function
        5. Maximum Likelihood Estimation
    7. 2.7 Bayes’ Theorem and Bayesian Statistics
        1. The Law of Total Probability
        2. Bayes’ Theorem
        3. Bayesian Statistics
    8. 2.8 Markov Chains
  5. Chapter 3 Methods for Generating Random Variables
    1. 3.1 Introduction
        1. Random Generators of Common Probability Distributions in R
    2. 3.2 The Inverse Transform Method
      1. 3.2.1 Inverse Transform Method, Continuous Case
      2. 3.2.2 Inverse Transform Method, Discrete Case
    3. 3.3 The Acceptance-Rejection Method
        1. The Acceptance-Rejection Method
    4. 3.4 Transformation Methods
    5. 3.5 Sums and Mixtures
        1. Convolutions
        2. Mixtures
    6. 3.6 Multivariate Distributions
      1. 3.6.1 Multivariate Normal Distribution
        1. Method for generating multivariate normal samples
        2. Spectral decomposition method for generating Nd(µ, ∑) samples
        3. SVD Method of generating Nd(µ, Σ) samples
        4. Choleski factorization method of generating Nd(µ, Σ) samples
        5. Comparing Performance of Generators
      2. 3.6.2 Mixtures of Multivariate Normals
        1. To generate a random sample from pNd(µ1, Σ1) + (1 − p)Nd(µ2, Σ2)
      3. 3.6.3 Wishart Distribution
      4. 3.6.4 Uniform Distribution on the d-Sphere
        1. Algorithm to generate uniform variates on the d-Sphere
    7. 3.7 Stochastic Processes
        1. Poisson Processes
        2. Algorithm for simulating a homogeneous Poisson process on an interval [0, t0] by generating interarrival times.
        3. Nonhomogeneous Poisson Processes
        4. Algorithm for simulating a nonhomogeneous Poisson process on an interval [0, t0] by sampling from a homogeneous Poisson process.
        5. Renewal Processes
        6. Symmetric Random Walk
        7. Algorithm to simulate the state Sn of a symmetric random walk
        8. Packages and Further Reading
    8. Exercises
      1. Figure 3.1
      2. Figure 3.2
      3. Figure 3.3
      4. Figure 3.4
      5. Figure 3.5
      6. Figure 3.6
      7. Figure 3.7
      8. Figure 3.8
      9. Figure 3.9
      10. Figure 3.10
      11. Figure 3.11
      1. Table 3.1
  6. Chapter 4 Visualization of Multivariate Data
    1. 4.1 Introduction
    2. 4.2 Panel Displays
    3. 4.3 Surface Plots and 3D Scatter Plots
      1. 4.3.1 Surface plots
        1. Adding elements to a perspective plot
        2. Other functions for graphing surfaces
      2. 4.3.2 Three-dimensional scatterplot
    4. 4.4 Contour Plots
    5. 4.5 Other 2D Representations of Data
      1. 4.5.1 Andrews Curves
      2. 4.5.2 Parallel Coordinate Plots
      3. 4.5.3 Segments, stars, and other representations
    6. 4.6 Other Approaches to Data Visualization
    7. Exercises
      1. Figure 4.1
      2. Figure 4.2
      3. Figure 4.3
      4. Figure 4.4
      5. Figure 4.5
      6. Figure 4.6
      7. Figure 4.7
      8. Figure 4.8
      9. Figure 4.9
      10. Figure 4.10
      1. Table 4.1
  7. Chapter 5 Monte Carlo Integration and Variance Reduction
    1. 5.1 Introduction
    2. 5.2 Monte Carlo Integration
      1. 5.2.1 Simple Monte Carlo estimator
        1. The standard error of θ^ = 1m∑i=1mg(xi).
      2. 5.2.2 Variance and Efficiency
        1. Efficiency
    3. 5.3 Variance Reduction
    4. 5.4 Antithetic Variables
    5. 5.5 Control Variates
      1. 5.5.1 Antithetic variate as control variate.
      2. 5.5.2 Several control variates.
      3. 5.5.3 Control variates and regression.
    6. 5.6 Importance Sampling
        1. Variance in Importance Sampling
    7. 5.7 Stratified Sampling
    8. 5.8 Stratified Importance Sampling
    9. Exercises
    10. R Code
      1. Figure 5.1
  8. Chapter 6 Monte Carlo Methods in Inference
    1. 6.1 Introduction
    2. 6.2 Monte Carlo Methods for Estimation
      1. 6.2.1 Monte Carlo estimation and standard error
        1. Estimating the standard error of the mean
      2. 6.2.2 Estimation of MSE
      3. 6.2.3 Estimating a confidence level
        1. Monte Carlo experiment to estimate a confidence level
    3. 6.3 Monte Carlo Methods for Hypothesis Tests
      1. 6.3.1 Empirical Type I error rate
      2. 6.3.2 Power of a Test
        1. Monte Carlo experiment to estimate power of a test against a fixed alternative
      3. 6.3.3 Power comparisons
    4. 6.4 Application: “Count Five” Test for Equal Variance
    5. Exercises
    6. Projects
      1. Figure 6.1
      2. Figure 6.2
      3. Figure 6.3
      4. Figure 6.4
      1. Table 6.1
      2. Table 6.2
  9. Chapter 7 Bootstrap and Jackknife
    1. 7.1 The Bootstrap
      1. 7.1.1 Bootstrap Estimation of Standard Error
      2. 7.1.2 Bootstrap Estimation of Bias
    2. 7.2 The Jackknife
        1. The Jackknife Estimate of Bias
        2. The jackknife estimate of standard error
        3. When the Jackknife Fails
    3. 7.3 Jackknife-after-Bootstrap
      1. Jackknife-after-bootstrap: Empirical influence values
    4. 7.4 Bootstrap Confidence Intervals
      1. 7.4.1 The Standard Normal Bootstrap Confidence Interval
      2. 7.4.2 The Basic Bootstrap Confidence Interval
      3. 7.4.3 The Percentile Bootstrap Confidence Interval
      4. 7.4.4 The Bootstrap t interval
        1. Bootstrap t interval (studentized bootstrap interval)
    5. 7.5 Better Bootstrap Confidence Intervals
        1. Properties of BCa intervals
    6. 7.6 Application: Cross Validation
        1. Procedure to estimate prediction error by n-fold (leave-one-out) cross validation
    7. Exercises
    8. Projects
      1. Figure 7.1
      2. Figure 7.2
      3. Figure 7.3
  10. Chapter 8 Permutation Tests
    1. 8.1 Introduction
        1. Permutation Distribution
        2. Approximate permutation test procedure
    2. 8.2 Tests for Equal Distributions
        1. Two-sample tests for univariate data
    3. 8.3 Multivariate Tests for Equal Distributions
        1. Nearest neighbor tests
        2. Energy test for equal distributions
        3. Comparison of nearest neighbor and energy tests
    4. 8.4 Application: Distance Correlation
        1. Distance Correlation
        2. Permutation tests of independence
        3. Approximate permutation test procedure for independence
    5. Exercises
    6. Projects
      1. Figure 8.1
      2. Figure 8.2
      3. Figure 8.3
      4. Figure 8.4
      5. Figure 8.5
      1. Table 8.1
      2. Table 8.2
  11. Chapter 9 Markov Chain Monte Carlo Methods
    1. 9.1 Introduction
      1. 9.1.1 Integration problems in Bayesian inference
      2. 9.1.2 Markov Chain Monte Carlo Integration
    2. 9.2 The Metropolis-Hastings Algorithm
      1. 9.2.1 Metropolis-Hastings Sampler
      2. 9.2.2 The Metropolis Sampler
      3. 9.2.3 Random Walk Metropolis
      4. 9.2.4 The Independence Sampler
    3. 9.3 The Gibbs Sampler
    4. 9.4 Monitoring Convergence
      1. 9.4.1 The Gelman-Rubin Method
    5. 9.5 Application: Change Point Analysis
    6. Exercises
    7. R Code
        1. Code for Figure 9.3 on page 255
        2. Code for Figures 9.4(a) on page 259 and 9.4(b) on page 259
        3. Code for Figure 9.11 on page 276
        4. Code for Figure 9.12 on page 276
      1. Figure 9.1
      2. Figure 9.2
      3. Figure 9.3
      4. Figure 9.4
      5. Figure 9.5
      6. Figure 9.6
      7. Figure 9.7
      8. Figure 9.8
      9. Figure 9.9
      10. Figure 9.10
      11. Figure 9.11
      12. Figure 9.12
      1. Table 9.1
  12. Chapter 10 Probability Density Estimation
    1. 10.1 Univariate Density Estimation
      1. 10.1.1 Histograms
        1. Sturges’ Rule
        2. Scott’s Normal Reference Rule
        3. Freedman-Diaconis Rule
      2. 10.1.2 Frequency Polygon Density Estimate
      3. 10.1.3 The Averaged Shifted Histogram
    2. 10.2 Kernel Density Estimation
        1. Boundary kernels
    3. 10.3 Bivariate and Multivariate Density Estimation
      1. 10.3.1 Bivariate Frequency Polygon
        1. 3D Histogram
      2. 10.3.2 Bivariate ASH
      3. 10.3.3 Multidimensional kernel methods
    4. 10.4 Other Methods of Density Estimation
    5. Exercises
    6. R Code
        1. Code to generate data as shown in Table 10.1 on page 289.
        2. Code to plot the histograms in Figure 10.4 on page 294.
        3. Code to plot Figure 10.6 on page 298.
        4. Code to plot kernels in Figure 10.7 on page 299.
      1. Figure 10.1
      2. Figure 10.2
      3. Figure 10.3
      4. Figure 10.4
      5. Figure 10.5
      6. Figure 10.6
      7. Figure 10.7
      8. Figure 10.8
      9. Figure 10.9
      10. Figure 10.10
      11. Figure 10.11
      12. Figure 10.12
      13. Figure 10.13
      1. Table 10.1
      2. Table 10.2
  13. Chapter 11 Numerical Methods in R
    1. 11.1 Introduction
        1. Computer representation of real numbers
        2. Evaluating Functions
    2. 11.2 Root-finding in One Dimension
        1. Bisection method
        2. Brent’s method
    3. 11.3 Numerical Integration
    4. 11.4 Maximum Likelihood Problems
    5. 11.5 One-dimensional Optimization
    6. 11.6 Two-dimensional Optimization
    7. 11.7 The EM Algorithm
    8. 11.8 Linear Programming – The Simplex Method
    9. 11.9 Application: Game Theory
    10. Exercises
      1. Figure 11.1
      2. Figure 11.2
      3. Figure 11.3
      4. Figure 11.4
      1. Table 11.1
  14. Appendix A Notation
  15. Appendix B Working with Data Frames and Arrays
    1. B.1 Resampling and Data Partitioning
      1. B.1.1 Using the boot function
      2. B.1.2 Sampling without replacement
    2. B.2 Subsetting and Reshaping Data
      1. B.2.1 Subsetting Data
      2. B.2.2 Stacking/Unstacking Data
      3. B.2.3 Merging Data Frames
      4. B.2.4 Reshaping Data
    3. B.3 Data Entry and Data Analysis
      1. B.3.1 Manual Data Entry
      2. B.3.2 Recoding Missing Values
      3. B.3.3 Reading and Converting Dates
      4. B.3.4 Importing/exporting .csv files
      5. B.3.5 Examples of data entry and analysis
        1. Stacked data entry
        2. Extracting statistics and estimates from fitted models
        3. Create data frame in stacked layout
  16. References