Error Structure

Up to this point, we have dealt with the statistical analysis of data with normal errors. In practice, however, many kinds of data have non-normal errors: for example:

  • errors that are strongly skewed;
  • errors that are kurtotic;
  • errors that are strictly bounded (as in proportions);
  • errors that cannot lead to negative fitted values (as in counts).

In the past, the only tools available to deal with these problems were transformation of the response variable or the adoption of non-parametric methods. A GLM allows the specification of a variety of different error distributions:

  • Poisson errors, useful with count data;
  • binomial errors, useful with data on proportions;
  • gamma errors, useful with data showing a constant coefficient of variation;
  • exponential errors, useful with data on time to death (survival analysis).

The error structure is defined by means of the family directive, used as part of the model formula. Examples are

glm(y ~ z, family = poisson )

which means that the response variable y has Poisson errors, and

glm(y ~ z, family = binomial )

which means that the response is binary, and the model has binomial errors. As with previous models, the explanatory variable z can be continuous (leading to a regression analysis) or categorical (leading to an ANOVA-like procedure called analysis of deviance, as described below).

Get The R Book now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.