One-Way ANOVA

There is a real paradox about analysis of variance, which often stands in the way of a clear understanding of exactly what is going on. The idea of analysis of variance is to compare two or more means, but it does this by comparing variances. How can that work?

The best way to see what is happening is to work through a simple example. We have an experiment in which crop yields per unit area were measured from 10 randomly selected fields on each of three soil types. All fields were sown with the same variety of seed and provided with the same fertilizer and pest control inputs. The question is whether soil type significantly affects crop yield, and if so, to what extent.

results<-read.table("c:\\temp\\yields.txt",header=T)
attach(results)
names(results)

[1] "sand" "clay" "loam"

To see the data just type results followed by the Return key:

sand    clay   loam
 1     6      17     13
 2    10      15     16
 3     8       3      9
 4     6      11     12
 5    14      14     15
6     17      12     16
7      9      12     17
8     11       8     13
9      7      10     18
10    11      13     14

The function sapply is used to calculate the mean yields for the three soils:

sapply(list(sand,clay,loam),mean)

[1] 9.9 11.5 14.3

Mean yield was highest on loam (14.3) and lowest on sand (9.9).

It will be useful to have all of the yield data in a single vector called y:

y<-c(sand,clay,loam)

and to have a single vector called soil to contain the factor levels for soil type:

soil<-factor(rep(1:3,c(10,10,10)))

Before carrying out analysis of variance, we should check for constancy of variance (see Chapter 8 ...

Get The R Book now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.