O'Reilly logo

Mastering pandas by Femi Anthony

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Aggregation and GroupBy

Sometimes, we may wish to split data into subsets and apply a function such as the mean, max, or min to each subset. In R, we can do this via the aggregate or tapply functions.

Here, we will use the example of a dataset of statistics on the top five strikers of the four clubs that made it to the semi-final of the European Champions League Football tournament in 2014. We will use it to illustrate aggregation in R and its equivalent GroupBy functionality in pandas.

Aggregation in R

In R aggregation is done using the following command:

> goal_stats=read.csv('champ_league_stats_semifinalists.csv')
>goal_stats
              Club                 Player Goals GamesPlayed
1  Atletico Madrid            Diego Costa     8           9
2  Atletico Madrid             ArdaTuran     4           9
3 Atletico Madrid RaúlGarcía ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required