Optional arguments in model-fitting functions

Unless you argue to the contrary, all of the rows in the dataframe will be used in the model fitting, there will be no offsets, and all values of the response variable will be given equal weight. Variables named in the model formula will come from the defined dataframe (data=mydata), the with function (p. 18) or from the attached dataframe (if there is one). Here we illustrate the following options:

  • subset
  • weights
  • data
  • offset
  • na.action

We shall work with an example involving analysis of covariance (p. 490 for details) where we have a mix of both continuous and categorical explanatory variables:

data<-read.table("c:\\temp\\ipomopsis.txt",header=T)
attach(data)
names(data)

[1] "Root" "Fruit" "Grazing"

The response is seed production (Fruit) with a continuous explanatory variable (Root diameter) and a two-level factor Grazing (Grazed and Ungrazed).

Subsets

Perhaps the most commonly used modelling option is to fit the model to a subset of the data (e.g. fit the model to data from just the grazed plants). You could do this using subscripts on the response variable and all the explanatory variables:

model<-lm(Fruit[Grazing=="Grazed"] ~ Root[Grazing=="Grazed"])

but it is much more straightforward to use the subset argument, especially when there are lots of explanatory variables:

model<-lm(Fruit~ Root,subset=(Grazing=="Grazed"))

The answer, of course, is the same in both cases, but the summary.lm and summary.aov tables are neater with ...

Get The R Book now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.