Testing for Lack of Fit in a Regression with Replicated Data at Each Level of x

The unreliability estimates of the parameters explained in Boxes 10.2 and 10.3 draw attention to the important issues in optimizing the efficiency of regression designs. We want to make the error variance as small as possible (as always), but in addition, we want to make SSX as large as possible, by placing as many points as possible at the extreme ends of the x axis. Efficient regression designs allow for:

  • replication of least some of the levels of x;
  • a preponderance of replicates at the extremes (to maximize SSX);
  • sufficient different values of x to allow accurate location of thresholds.

Here is an example where replication allows estimation of pure sampling error, and this in turn allows a test of the significance of the data's departure from linearity. As the concentration of an inhibitor is increased, the reaction rate declines:

data<-read.delim("c:\\temp\\lackoffit.txt")
attach(data)
names(data)

[1] "conc" "rate"

plot(conc,rate,pch=16,ylim=c(0,8))
abline(lm(rate~conc))

The linear regression does not look too bad, and the slope is highly significantly different from zero:

model.reg<-lm(rate~conc)
summary.aov(model.reg)

            Df   Sum Sq    Mean Sq    F value      Pr(>F)
conc         1   74.298     74.298     55.333   4.853e-07 ***
Residuals   19   25.512      1.343

Because there is replication at each level of x we can do something extra, compared with a typical regression analysis. We can estimate what is called the pure error variance ...

Get The R Book now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.