Summarizing Functions

Often, you are provided with data that is too fine grained for your analysis. For example, you might be analyzing data about a website. Suppose that you wanted to know the average number of pages delivered to each user. To find the answer, you might need to look at every HTTP transaction (every request for content), grouping together requests into sessions and counting the number of requests. R provides a number of different functions for summarizing data, aggregating records together to build a smaller data set.

tapply, aggregate

The tapply function is a very flexible function for summarizing a vector X. You can specify which subsets of X to summarize as well as the function used for summarization:

tapply(X, INDEX, FUN = , ..., simplify = )

Here are the arguments to tapply.

ArgumentDescriptionDefault
XThe object on which to apply the function (usually a vector). 
INDEXA list of factors that specify different sets of values of X over which to calculate FUN, each the same length as X. 
FUNThe function applied to elements of X.NULL
...Optional arguments are passed to FUN. 
simplifyIf simplify=TRUE, then if FUN returns a scalar, then tapply returns an array with the mode of the scalar. If simplify=FALSE, then tapply returns a list.TRUE

For example, we can use tapply to sum the number of home runs by team:

> tapply(X=batting.2008$HR,INDEX=list(batting.2008$teamID),FUN=sum) ARI ATL BAL BOS CHA CHN CIN CLE COL DET FLO HOU KCA LAA LAN MIL MIN 159 130 172 173 235 184 187 171 160 ...

Get R in a Nutshell now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.