This chapter is all about the *apply*
functions: `apply`

, `lapply`

,
`sapply`

, `tapply`

,
`mapply`

; and their cousins, `by`

and
`split`

. These functions let you take data in great gulps
and process the whole gulp at once. Where traditional programming
languages use loops, R uses vectorized operations and the apply functions
to crunch data in batches, greatly streamlining the calculations.

An important idiom of R is using a factor to define a group. Suppose we have a vector and a factor, both of the same length, that were created as follows:

>>`v <- c(40,2,83,28,58)`

`f <- factor(c("A","C","C","B","C"))`

We can visualize the vector elements and factors levels side by side, like this:

Vector | Factor |
---|---|

40 | A |

2 | C |

83 | A |

28 | B |

58 | C |

The factor level identifies the group of each vector element: 40 and 83 are in group A; 28 is in group B; and 2 and 58 are in group C.

In this book, I refer to such factors as *grouping factors*. They effectively
slice and dice our data by putting them into groups. This is powerful
because processing data in groups occurs often in statistics when
comparing group means, comparing group proportions, performing ANOVA
analysis, and so forth.

This chapter has recipes that use grouping factors to split vector elements into their respective groups (Recipe 6.1), apply a function to groups within a vector (Recipe 6.5), and apply a function to groups of rows within a data frame (Recipe 6.6). In other chapters, the same idiom is used to test ...

Start Free Trial

No credit card required