Curse of High-Dimensionality in Big Data

In this chapter, we will cover the following topics:

  • Two methods of ingesting and preparing a CSV file for processing in Spark
  • Singular Value Decomposition (SVD) to reduce high-dimensionality in Spark
  • Principal Component Analysis (PCA) to pick the most effective latent factor for machine learning in Spark

Get Apache Spark 2.x Machine Learning Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.