Dimensionality reduction with singular value decomposition

Often, the original dimensions do not represent data in the best way possible. As we saw in PCA, you can, sometimes, project the data to fewer dimensions and still retain most of the useful information.

Sometimes, the best approach is to align dimensions along the features that exhibit most of the variations. This approach helps to eliminate dimensions that are not representative of the data.

Let's look at the following figure again, which shows the best-fit line on two dimensions:

Dimensionality reduction with singular value decomposition

The projection line shows the best approximation of the original data with one dimension. If we take the points ...

Get Spark Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.