Any significant application of R includes statistics or models or graphics. This chapter addresses the statistics. Some recipes simply describe how to calculate a statistic, such as relative frequency. Most recipes involve statistical tests or confidence intervals. The statistical tests let you choose between two competing hypotheses; that paradigm is described next. Confidence intervals reflect the likely range of a population parameter and are calculated based on your data sample.

Many of the statistical tests in this chapter use a time-tested paradigm of statistical inference. In the paradigm, we have one or two data samples. We also have two competing hypotheses, either of which could reasonably be true.

One hypothesis, called the *null hypothesis*, is that
*nothing happened*: the mean was unchanged; the
treatment had no effect; you got the expected answer; the model did not
improve; and so forth.

The other hypothesis, called the *alternative hypothesis*, is that
*something happened*: the mean rose; the treatment
improved the patients’ health; you got an unexpected answer; the model
fit better; and so forth.

We want to determine which hypothesis is more likely in light of the data:

To begin, we assume that the null hypothesis is true.

We calculate a test statistic. It could be something simple, such as the mean of the sample, or it could be quite complex. The critical requirement is that we must ...

Start Free Trial

No credit card required