Binary Response with Pseudoreplication

In the bacteria dataframe, which is part of the MASS library, we have repeated assessment of bacterial infection (yes or no, coded as y or n) in a series of patients allocated at random to one of three treatments: placebo, drug and drug plus supplement. The trial lasted for 11 weeks and different patients were assessed different numbers of times. The question is whether the two treatments significantly reduced bacterial infection.

library(MASS)
attach(bacteria)
names(bacteria)

[1] "y" "ap" "hilo" "week" "ID" "trt"
table(y)
y
       n     y
       43  177

The data are binary, so we need to use family=binomial. There is temporal pseudoreplication (repeated measures on the same patients) so we cannot use glm. The ideal solution is the generalized mixed models function lmer. Unlike glm, the lmer function cannot take text (e.g. a two-level factor like y) as the response variable, so we need to convert the y and n into a vector of 1s and 0s:

y<-1*(y=="y")
table(y,trt)

trt
    y    placebo    drug    drug+
    0        12        18    13
    1        84        44    49

Preliminary data inspection suggests that the drug might be effective because only 12 out of 96 patient visits were bacteria-free in the placebos, compared with 31 out of 124 for the treated individuals. We shall see. The modelling goes like this: the lmer function is in the Ime4 package and the random effects appear in the same formula as the fixed effects, but defined by the brackets and the ‘given’ operator |

library(lme4)
model1<-lmer(y~trt+(week ...

Get The R Book now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.