Finding the principal components in your data using randomized PCA

PCA (and Kernel PCA) both use low-rank matrix approximation to estimate the principal components. The low-rank matrix approximation minimizes a cost function represented as a fit between a given matrix and its approximation.

Such a method might be really costly for big datasets. By randomizing how the singular value decomposition of the input dataset happens, the speed up in the estimation is significant.

Getting ready

To execute this recipe, you will need NumPy, Scikit, and Matplotlib. No other prerequisites are required.

How to do it…

As before, we create a wrapper method to estimate our model (the reduce_randomizedPCA.py file):

def reduce_randomizedPCA(x): ''' Reduce the dimensions ...

Get Practical Data Analysis Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Practical Data Analysis Cookbook by Tomasz Drabas

Finding the principal components in your data using randomized PCA

Getting ready

How to do it…

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly