Bootstrap Resampling
When analyzing statistics, analysts often wonder if the statistics are sensitive to a few outlying values. Would we get a similar result if we were to omit a few points? What are the range of values for the statistic? It is possible to answer this question for an arbitrary statistic using a technique called bootstrapping.
Formally, bootstrap resampling is a technique for estimating the bias of an estimator. An estimator is a statistic calculated from a data sample that provides an estimate of a true underlying value, often a mean, standard deviation, or a hidden parameter.
Bootstrapping works by repeatedly selecting random observations
from a data sample (with replacement) and recalculating the statistic.
In R, you can use bootstrap resampling through the boot
function in the boot
package:
library(boot) boot(data, statistic, R, sim="ordinary", stype="i", strata=rep(1,n), L=NULL, m=0, weights=NULL, ran.gen=function(d, p) d, mle=NULL, simple=FALSE, ...)
Arguments to boot
include the
following.
Argument | Description | Default |
---|---|---|
data | A vector, matrix, or data frame containing the input data. | |
statistic | A function that, when applied to the data, returns a vector containing the statistic of interest. The function takes two arguments: the source data and a vector that specifies which values to select for each bootstrap replicate. The meaning of the second argument is defined by stype. | |
R | A numeric value specifying the number of bootstrap replicates. | |
sim | A character value specifying the ... |
Get R in a Nutshell now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.