23

Multivariate Statistics

This class of statistical methods is fundamentally different from the others in this book because there is no response variable. Instead of trying to understand variation in a response variable in terms of explanatory variables, in multivariate statistics we look for structure in the data. The problem is that structure is rather easy to find, and all too often it is a feature of that particular data set alone. The real challenge is to find general structure that will apply to other data sets as well. Unfortunately, there is no guaranteed means of detecting pattern, and a great deal of ingenuity has been shown by statisticians in devising means of pattern recognition in multivariate data sets. The main division is between methods that assume a given structure and seek to divide the cases into groups, and methods that seek to discover structure from inspection of the dataframe.

The multivariate techniques implemented in R include:

  • principal components analysis (prcomp);
  • factor analysis (factanal);
  • cluster analysis (hclust, kmeans);
  • discriminant analysis (lda, qda);
  • neural networks (nnet).

These techniques are not recommended unless you know exactly what you are doing, and exactly why you are doing it. Beginners are sometimes attracted to multivariate techniques because of the complexity of the output they produce, making the classic mistake of confusing the opaque for the profound.

Get The R Book now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.