You are previewing R in a Nutshell, 2nd Edition.

Summary Statistics

R includes a variety of functions for calculating summary statistics.

To calculate the mean of a vector, use the `mean` function. You can calculate minima with the `min` function, or maxima with the `max` function. As an example, let’s use the `dow30` data set that we created in An extended example. This data set is also available in the `nutshell` package:

```> library(nutshell)
> data(dow30)
> mean(dow30\$Open)
[1] 36.24574
> min(dow30\$Open)
[1] 0.99
> max(dow30\$Open)
[1] 122.45```

For each of these functions, the argument `na.rm` specifies how `NA` values are treated. By default, if any value in the vector is `NA`, then the value `NA` is returned. Specify `na.rm=TRUE` to ignore missing values:

```> mean(c(1, 2, 3, 4, 5, NA))
[1] NA
> mean(c(1, 2, 3, 4, 5, NA), na.rm=TRUE)
[1] 3```

Optionally, you can also remove outliers when using the `mean` function. To do this, use the `trim` argument to specify the fraction of observations to filter:

```> mean(c(-1, 0:100, 2000))
[1] 68.4369
> mean(c(-1, 0:100, 2000), trim=0.1)
[1] 50```

To calculate the minimum and maximum at the same time, use the `range` function. This returns a vector with the minimum and maximum value:

```> range(dow30\$Open)
[1]   0.99 122.45```

Another useful function is `quantile`. This function can be used to return the values at different percentiles (specified by the `probs` argument):

```> quantile(dow30\$Open, probs=c(0, 0.25, 0.5, 0.75, 1.0))
0%     25%     50%     75%    100%
0.990  19.655  30.155  51.680 122.450```

You can return this specific set of values (minimum, 25th percentile, ...