Exploring the data from the NSFG, we saw several “apparent effects,” including a number of differences between first babies and others. So far, we have taken these effects at face value; in this chapter, finally, we put them to the test.
The fundamental question we want to address is whether these effects are real. For example, if we see a difference in the mean pregnancy length for first babies and others, we want to know whether that difference is real, or whether it occurred by chance.
That question turns out to be hard to address directly, so we will proceed in two steps. First we will test whether the effect is significant, then we will try to interpret the result as an answer to the original question.
In the context of statistics, “significant” has a technical definition that is different from its use in common language. As defined earlier, an apparent effect is statistically significant if it is unlikely to have occurred by chance.
To make this more precise, we have to answer three questions:
What do we mean by “chance”?
What do we mean by “unlikely”?
What do we mean by “effect”?
All three of these questions are harder than they look. Nevertheless, there is a general structure that people use to test statistical significance:
A model of the system based on the assumption that the apparent effect was actually due to chance.
The probability of the apparent effect under the null hypothesis.
Based on the p-value, we conclude ...