Distribution

You’ve probably heard of mean, median, and mode. Schools teach you this stuff in high school; although they should teach it much sooner. The mean is the sum of all data points divided by the number of points. To find the median, you order your data from least to greatest and mark the halfway point. The mode is the number that occurs the most. These are well and good and super easy to find, but they don’t give you the whole story. They describe how parts of your data are distributed. If you visualize everything though, you can see the full distribution.

A skew to the left means most of your data is clustered in the lower side of the full range. A skew to the right means the opposite. A flat line means a uniform distribution, whereas the classic bell curve shows a clustering at the mean and a gradual decrease in both directions.

Next take a look at a classic plot, mainly to get a feel for distribution, and then move on to the more practical histogram and density plot.

Old School Distribution

In the 1970s, when computers weren’t so popular, most data graphics were drawn by hand. Some of the tips offered by famed statistician John Tukey, in his book Exploratory Data Analysis, were centered around using pen and pencil to vary darkness of lines and shades. You could also use hash patterns as a fill to differentiate between variables.

The stem-and-leaf plot, or stemplot, was designed in a similar manner. All you have to do is write the numbers using an ordered method, and ...

Get Visualize This: The FlowingData Guide to Design, Visualization, and Statistics now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.