Chapter 5. Modeling Distributions

The distributions we have used so far are called empirical distributions because they are based on empirical observations, which are necessarily finite samples.

The alternative is an analytic distribution, which is characterized by a CDF that is a mathematical function. Analytic distributions can be used to model empirical distributions. In this context, a model is a simplification that leaves out unneeded details. This chapter presents common analytic distributions and uses them to model data from a variety of sources.

The code for this chapter is in analytic.py. For information about downloading and working with this code, see Using the Code.

The Exponential Distribution

I’ll start with the exponential distribution because it is relatively simple. The CDF of the exponential distribution is

The Exponential Distribution

The parameter, λ, determines the shape of the distribution. Figure 5-1 shows what this CDF looks like with 0.5, 1, and 2.

In the real world, exponential distributions come up when we look at a series of events and measure the times between events, called interarrival times. If the events are equally likely to occur at any time, the distribution of interarrival times tends to look like an exponential distribution.

Figure 5-1. CDFs of exponential distributions with various ...

Get Think Stats, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.