PCA with H2O

We can also use the PCA implementation provided by H2O. (We've already seen H2O in the previous chapter and mentioned it along the book.)

With H2O, we first need to turn on the server with the init method. Then, we dump the dataset on a file (precisely, a CSV file) and finally run the PCA analysis. As the last step, we shut down the server.

We're trying this implementation on some of the biggest datasets seen so far—the one with 100K observations and 100 features and the one with 10K observations and 2,500 features:

In: import h2o from h2o.transforms.decomposition import H2OPCA h2o.init(max_mem_size_GB=4) def testH2O_pca(nrows, ncols, k=20): temp_file = tempfile.NamedTemporaryFile().name X, _ = make_blobs(nrows, n_features=ncols, random_state=101) ...

Get Python: Real World Machine Learning now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.