Bootstrapping a GLM

There are two contrasting ways of using bootstrapping with statistical models:

  • Fit the model lots of times by selecting cases for inclusion at random with replacement, so that some data points are excluded and others appear more than once in any particular model fit.
  • Fit the model once and calculate the residuals and the fitted values, then shuffle the residuals lots of times and add them to the fitted values in different permutations, fitting the model to the many different data sets.

In both cases, you will obtain a distribution of parameter values for the model from which you can derive confidence intervals. Here we use the timber data (a multiple regression with two continuous explanatory variables, introduced on p. 336) to illustrate the two approaches (see p. 284 for an introduction to the bootstrap).

library(boot)

The GLM model with its parameter estimates and standard errors is on p. 519. The hard part of using boot is writing the sampling function correctly. It has at least two arguments: the first must be the data on which the resampling is to be carried out (in this case, the whole dataframe called trees), and the second must be the index (the randomized subscripts showing which data values are to be used in a given realization; some cases will be repeated, others will be omitted). Inside the function we create a new dataframe based on the randomly selected indices, then fit the model to this new data set. Finally, the function should return the ...

Get The R Book now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.