Overdispersion in Log-linear Models

The data analysed in this section refer to children from Walgett, New South Wales, Australia, who were classified by sex (with two levels: male (M) and female (F)), culture (also with two levels: Aboriginal (A) and not (N)), age group (with four levels: F0 (primary), F1, F2 and F3) and learner status (with two levels: average (AL) and slow (SL)). The response variable is a count of the number of days absent from school in a particular school year.

library(MASS)
data(quine)
attach(quine)
names(quine)

[1] "Eth" "Sex" "Age" "Lrn" "Days"

We begin with a log-linear model for the counts, and fit a maximal model containing all the factors and all their interactions:

model1<-glm(Days~Eth*Sex*Age*Lrn,poisson)
summary(model1)

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 2073.5 on 145 degrees of freedom
Residual deviance: 1173.9 on 118 degrees of freedom
AIC: 1818.4

Next, we check the residual deviance to see if there is overdispersion. Recall that the residual deviance should be equal to the residual degrees of freedom if the Poisson errors assumption is appropriate. Here it is 1173.9 on 118 d.f., indicating overdispersion by a factor of roughly 10. This is much too big to ignore, so before embarking on model simplification we try a different approach, using quasi-Poisson errors to account for the overdispersion:

model2<-glm(Days~Eth*Sex*Age*Lrn,quasipoisson)
summary(model2) Deviance Residuals: Min 1Q Median 3Q Max -7.3872 ...

Get The R Book now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.