A Regression with Poisson Errors

The following example has a count (the number of reported cancer cases per year per clinic) as the response variable, and a single continuous explanatory variable (the distance from a nuclear plant to the clinic in km). The question is whether or not proximity to the reactor affects the number of cancer cases.

clusters<-read.table("c:\\temp\\clusters.txt",header=T)
attach(clusters)
names(clusters)

[1] "Cancers" "Distance"

plot(Distance,Cancers)

There seems to be a downward trend in cancer cases with distance (see the plot below). But is the trend significant? We do a regression of cases against distance, using a GLM with Poisson errors:

model1<-glm(Cancers~Distance,poisson)
summary(model1)

Coefficients:
              Estimate     Std. Error  z value   Pr(>|z|)
(Intercept)   0.186865       0.188728    0.990     0.3221
Distance     -0.006138       0.003667   -1.674     0.0941  .

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 149.48 on 93 degrees of freedom
Residual deviance: 146.64 on 92 degrees of freedom
AIC: 262.41

The trend does not look to be significant, but look at the residual deviance. It is assumed that this is the same as the residual degrees of freedom. The fact that residual deviance is larger than residual degrees of freedom indicates that we have overdispersion (extra, unexplained variation in the response). We compensate for the overdispersion by refitting the model using quasi-Poisson rather than Poisson errors:

model2<-glm(Cancers~Distance,quasipoisson) ...

Get The R Book now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.