You are previewing Visualize This: The FlowingData Guide to Design, Visualization, and Statistics.

Comparison

Often it’s useful to compare multiple distributions rather than just the mean, median, and mode. These summary statistics are after all descriptors of the big picture. They tell you only part of a story.

For example, I could tell you that the average birth rate for the world in 2008 was 19.98 live births per 1,000 population and 32.87 in 1960, so the birth rate was about 39 percent lower in 2008 than it was in 1960. That only tells you what’s going on in the center of the distribution though. Or is it even the center? Are there only a few countries that had high birth rates in 1960, bringing up the average? Did differences in birth rate increase or decrease over the past few decades?

You can make comparisons in lots of ways. You could go entirely analytical and not use visualization at all. (I spent a year learning about statistical methods in graduate school, and that was just the tip of the iceberg.) You could also go the other way and use visualization. Your results won’t be an exact answer offered by a thorough statistical analysis, but they could be good enough to make an informed decision about whatever you’re looking into. Obviously you’re going to go the visualization route, or I would have named this book Analyze This.

Multiple Distributions

So far you’ve looked at only single distributions, namely birth rates for 2008. But if you looked at the data file or the data frame in R, you know that you have annual birth rates all the way back to 1960. If you didn’t ...