Very often, when analyzing data, you want to know if two
variables are *correlated*. Informally, correlation
answers the question, “When we increase (or decrease)
*x*, does *y* increase (or
decrease), and by how much?” Formally, correlation measures the linear
dependence between two random variables. Correlation measures range
between −1 and 1; 1 means that one variable is a (positive) linear
function of the other, 0 means the two variables aren’t correlated at all,
and −1 means that one variable is a negative linear function of the other
(the two move in completely opposite directions; see Figure 16-1).

Figure 16-1. Correlation (Source: http://xkcd.com/552/)

The most commonly used correlation measurement is the Pearson
correlation statistic (it’s the formula behind the `CORREL`

function in Excel):

where *x̄* is the mean of variable
*x*, and *ȳ* is the mean of variable
*y*. The Pearson correlation statistic is rooted in
properties of the normal distribution and works best with normally
distributed data. An alternative correlation function is the Spearman
correlation statistic. Spearman correlation is a nonparametric statistic and
doesn’t make any assumptions about the underlying distribution:

Another measurement of how well two random variables are related is Kendall’s tau.

Start Free Trial

No credit card required