Chapter 8. Kernel Density Plots

Density Estimation

A common problem in science is to estimate, from a data sample, a mathematical function that describes the relative likelihood that a variable (such as the systolic blood pressure in the sbp example in the previous two chapters) takes a particular value. We tried to make a rough estimate of such a graph with histograms in Chapter 7. So, for instance, if you take a glance at the histogram in Figure 7-2, you can see that systolic blood pressures close to 150 are very likely to occur, but scores of about 110 are relatively unlikely. The rule, or formula, that gives the likelihood of a given value of, for example, blood pressure is called the density function.

Histograms are a good tool for many problems, being easy to understand and relatively easy to compute. There is, however, a shortcoming of which you should be aware. Many functions of interest are continuous; that is, they can take any value within a certain range. A blood pressure value could be 120 or 123 or 129.2, yet the histogram might force all of those values to be in the same bin and thereby all to take the value of 120. (Remember that the bin width in the histogram in Figure 7-2 was 10, so all scores equal to or greater than 120 and less than 130 fall within the same bin.) That is to say, we used a discrete function—one  that can only take selected values of blood pressure—to estimate the density function, which is continuous. The graph in Figure 8-1a, a kernel density ...

Get Graphing Data with R now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.