Correlation

Correlation is probably the first thing you think of when you hear about relationships in data. The second thing is probably causation. Now maybe you’re thinking about the mantra that correlation doesn’t equal causation. The first, correlation, means one thing tends to change a certain way as another thing changes. For example, the price of milk per gallon and the price of gasoline per gallon are positively correlated. Both have been increasing over the years.

Now here’s the difference between correlation and causation. If you increase the price of gas, will the price of milk go up by default? More important, if the price of milk did go up, was it because of the increase in the gas price or was it an outside factor, such as a dairy strike?

It’s difficult to account for every outside, or confounding factor, which makes it difficult to prove causation. Researchers spend years figuring stuff like that out. You can, however, easily find and see correlation, which can still be useful, as you see in the following sections.

Correlation can help you predict one metric by knowing another. To see this relationship, return to scatterplot and multiple scatterplots.

More with Points

In Chapter 4, “Visualizing Patterns over Time,” you used a scatterplot to graph measurements over time, where time was on the horizontal axis and a metric of interest was on the vertical axis. This helped spot temporal changes (or nonchanges). The relationship was between time and another factor, or ...

Get Visualize This: The FlowingData Guide to Design, Visualization, and Statistics now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.