Principal component analysis

One of the most commonly used methods of dimensionality reduction is Principal Component Analysis (PCA). Conceptually, PCA computes the axes along which the variation in the data is greatest. You may recall that in Chapter 3, Finding Patterns in the Noise – Clustering and Unsupervised Learning, we calculated the eigenvalues of the adjacency matrix of a dataset to perform spectral clustering. In PCA, we also want to find the eigenvalue of the dataset, but here, instead of any adjacency matrix, we will use the covariance matrix of the data, which is the relative variation within and between columns. The covariance for columns xi and xj in the data matrix X is given by:

This is the average product of the offsets from the ...

Get Mastering Predictive Analytics with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.