Advanced Statistical Techniques
5.1 To Study the Relationships Between Variables
5.1.1 Linear Regression Analysis
Often data are available in which a variable, say Y, can be regarded as ‘dependent’ on several ‘independent’ variables, say X1, X2,…, Xr. This situation raises the opportunity to investigate the relationship between these variables.
The amount of carbon monoxide emitted by a vehicle travelling a certain route may depend on several factors related to the engine operating conditions (e.g. speed, gear, type of fuel used) and to environmental conditions (e.g. temperature, humidity). Suppose we set the values of the factors and then repeatedly observe the variable Y. When repeating the observation, the value of Y will not always be identical, although the values of the factors are fixed. So Y is a random variable and can be therefore indicated by Y. The factors are deterministic variables with fixed values x1, x2,…, xr. We write the following equation:
It is an analytical expression that links Y to the x1, x2,…, xr. The term ε on the right-hand side justifies the random nature of Y, it is a random variable that we call error term or random error.
We can imagine that ε only depends on the measurement process. In fact ε may be due to a combination of factors that intervene when observing the data, determining the variation of Y despite x1, x2,…, x