Summary Tables

The most important function in R for generating summary tables is the somewhat obscurely named tapply function. It is called tapply because it applies a named function (such as mean or variance) across specified margins (factor levels) to create a table. If you have used the PivotTable function in Excel you will be familiar with the concept.

Here is tapply in action:

data<-read.table("c:\\temp\\Daphnia.txt",header=T)
attach(data)
names(data)

[1] "Growth.rate" "Water" "Detergent"  "Daphnia"

The response variable is growth rate of the animals, and there are three categorical explanatory variables: the river from which the water was sampled, the kind of detergent experimentally added, and the clone of daphnia employed in the experiment. In the simplest case we might want to tabulate the mean growth rates for the four brands of detergent tested,

tapply(Growth.rate,Detergent,mean)

  BrandA    BrandB    BrandC    BrandD
3.884832  4.010044  3.954512  3.558231

or for the two rivers,

tapply(Growth.rate,Water,mean)

    Tyne        Wear
3.685862    4.017948

or for the three daphnia clones,

tapply(Growth.rate,Daphnia,mean)

  Clone1    Clone2    Clone3
2.839875  4.577121  4.138719

Two-dimension summary tables are created by replacing the single explanatory variable (the second argument in the function call) by a list indicating which variable is to be used for the rows of the summary table and which variable is to be used for creating the columns of the summary table. To get the daphnia clones as the rows ...

Get The R Book now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.