Part III

Bootstrapping

In the previous four chapters, we explored how to estimate expectations of random variables. A mean is never enough however. Ideally we would like to know the entire probability distribution of the variable.

Bootstrapping is a computational intensive method that allows researchers to simulate the distribution of a statistic. The idea is to repeatedly resample the observed data, each time producing an empirical distribution function from the resampled data. For each resampled data set—or equivalently each empirical distribution function—a new value of the statistic can be computed, and the collection of these values provides an estimate of the sampling distribution of the statistic of interest. In this manner, the method allows you to “pull yourself up by your bootstraps” (an old idiom, popularized in America, that means to improve your situation without outside help). Bootstrapping is nonparametric by nature, and there is a certain appeal to letting the data speak so freely.

Bootstrapping was first developed for independent and identically distributed data, but this assumption can be relaxed so that bootstrap estimates from dependent data such as regression residuals or time series data is possible. We will explore bootstrapping methods in both the independent and dependent cases, along with approaches for improving performance using more complex variations.

Get Computational Statistics, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.