Chapter 9. From Big to Small Data

Now that we have some cleansed data ready for analysis, let's first see how we can find our way around the high number of variables in our dataset. This chapter will introduce some statistical techniques to reduce the number of variables by dimension reduction and feature extraction, such as:

  • Principal Component Analysis (PCA)
  • Factor Analysis (FA)
  • Multidimensional Scaling (MDS) and a few other techniques

Note

Most dimension reduction methods require that two or more numeric variables in the dataset are highly associated or correlated, so the columns in our matrix are not totally independent of each other. In such a situation, the goal of dimension reduction is to decrease the number of columns in the dataset to the ...

Get Mastering Data Analysis with R now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.