Correlation measures the degree of linear dependence between two normally distributed variables. For a population it is defined by
is the population mean and E
(.) is the expectation.
And if we have a finite sample, our best estimate is given by
is the sample mean, N
is the sample size, and sx
is the sample standard deviation. Both ρ
are between negative one and one.
Correlation has an appealing relationship to regression. Consider the scatter diagram of Figure B.1
The regression line that best fits this data is shown in Figure B.2
The line has three properties:
1. Its intercept
2. Its slope
3. Its “goodness” of fit
Correlation is related to the second two properties. A positive slope tells us that if one variable increases, the other does as well. In this case correlation is positive. A negative slope tells us that if one variable increases the other decreases. In this case correlation is negative.
Correlation is bounded by negative one and one. If the data is well described by a line, ...