Correlation measures the degree of linear dependence between two normally distributed variables. For a population it is defined by*µ*_{x} is the population mean and *E*(.) is the expectation.

(B.1)

where And if we have a finite sample, our best estimate is given by*N* is the sample size, and *s*_{x} is the sample standard deviation. Both *ρ* and *r* are between negative one and one.

(B.2)

where is the sample mean, Correlation has an appealing relationship to regression. Consider the scatter diagram of Figure B.1.

The regression line that best fits this data is shown in Figure B.2.

The line has three properties:

1. Its intercept

2. Its slope

3. Its “goodness” of fit

Correlation is related to the second two properties. A positive slope tells us that if one variable increases, the other does as well. In this case correlation is positive. A negative slope tells us that if one variable increases the other decreases. In this case correlation is negative.

Correlation is bounded by negative one and one. If the data is well described by a line, ...

Start Free Trial

No credit card required