Bootstrap Resampling

When analyzing statistics, analysts often wonder if the statistics are sensitive to a few outlying values. Would we get a similar result if we were to omit a few points? What are the range of values for the statistic? It is possible to answer this question for an arbitrary statistic using a technique called bootstrapping.

Formally, bootstrap resampling is a technique for estimating the bias of an estimator. An estimator is a statistic calculated from a data sample that provides an estimate of a true underlying value, often a mean, standard deviation, or a hidden parameter.

Bootstrapping works by repeatedly selecting random observations from a data sample (with replacement) and recalculating the statistic. In R, you can use bootstrap resampling through the boot function in the boot package:

library(boot)
boot(data, statistic, R, sim="ordinary", stype="i", 
     strata=rep(1,n), L=NULL, m=0, weights=NULL, 
     ran.gen=function(d, p) d, mle=NULL, simple=FALSE, ...)

Arguments to boot include the following.

ArgumentDescriptionDefault
dataA vector, matrix, or data frame containing the input data. 
statisticA function that, when applied to the data, returns a vector containing the statistic of interest. The function takes two arguments: the source data and a vector that specifies which values to select for each bootstrap replicate. The meaning of the second argument is defined by stype. 
RA numeric value specifying the number of bootstrap replicates. 
simA character value specifying the ...

Get R in a Nutshell now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.