Chapter 13. Survival Analysis

Survival analysis is a way to describe how long things last. It is often used to study human lifetimes, but it also applies to “survival” of mechanical and electronic components, or more generally to intervals in time before an event.

If someone you know has been diagnosed with a life-threatening disease, you might have seen a “5-year survival rate,” which is the probability of surviving five years after diagnosis. That estimate and related statistics are the result of survival analysis.

The code in this chapter is in survival.py. For information about downloading and working with this code, see Using the Code.

Survival Curves

The fundamental concept in survival analysis is the survival curve, , which is a function that maps from a duration, t, to the probability of surviving longer than t. If you know the distribution of durations, or “lifetimes”, finding the survival curve is easy; it’s just the complement of the CDF:

Survival Curves

where is the probability of a lifetime less than or equal to t.

For example, in the NSFG dataset, we know the duration of 11189 complete pregnancies. We can read this data and compute the CDF:

 preg = nsfg.ReadFemPreg() complete = preg.query('outcome ...

Get Think Stats, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.