Often, you are provided with data that is too fine grained for your analysis. For example, you might be analyzing data about a website. Suppose that you wanted to know the average number of pages delivered to each user. To find the answer, you might need to look at every HTTP transaction (every request for content), grouping together requests into sessions and counting the number of requests. R provides a number of different functions for summarizing data, aggregating records together to build a smaller data set.

The `tapply`

function is
a very flexible function for summarizing a vector `X`

. You can specify which subsets of `X`

to summarize, as well as the function used
for summarization:

tapply(X, INDEX, FUN = , ..., simplify = )

Here are the arguments to `tapply`

.

Argument | Description | Default |
---|---|---|

X | The object on which to apply the function (usually a vector). | |

INDEX | A list of factors that specify different sets of values
of `X` over which to calculate
`FUN` , each the same length as
`X` . | |

FUN | The function applied to elements of `X` . | `NULL` |

... | Optional arguments are passed to `FUN` . | |

simplify | If `simplify=TRUE` , then
if `FUN` returns a scalar, then
`tapply` returns an array with
the mode of the scalar. If `simplify=FALSE` , then `tapply` returns a list. | `TRUE` |

For example, we can use `tapply`

to sum the number of home runs by team:

`> `**tapply(X=batting.2008$HR, INDEX=list(batting.2008$teamID), FUN=sum)** ARI ATL BAL BOS CHA CHN CIN CLE COL DET FLO HOU KCA LAA LAN MIL MIN 159 130 172 173 235 184 187 171 ...

Start Free Trial

No credit card required