7.4 Probability Distribution

The assumption of random sampling means that we consider all observations in a sample to have been made with equal probability. Under such conditions, the probability of finding a value in a certain bin is simply its frequency divided by the sum of all frequencies in the histogram. The probability of finding any value whatever in the distribution is exactly one, as this is the sum of the probabilities of all bins.

Since the histogram describes the probability of finding values in a limited number of discrete bins it is called a discrete distribution. If we are interested in calculating probabilities of continuous variables we need a continuous distribution. Such distributions are called probability density functions. They are explained here using a bit of mathematics, since integrals and derivatives are useful for illustrating the ideas. For readers who have forgotten some of their calculus, we do not have to be able to calculate the integrals manually. As we shall see soon, common software has built-in functions that do this.

We can think of probability density functions as histograms with a very large number of infinitesimally thin bins. (An infinitesimal interval is an interval so small that, although it is not of exactly zero size, it cannot be distinguished from zero.) In contrast to histograms, probability density functions appear as smooth curves in diagrams. Just as before, the sum of the probabilities of all these infinitesimal intervals is ...

Get Experiment!: Planning, Implementing and Interpreting now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.