Principal component analysis

There are numerous real-world use cases where the number of features available that may potentially be used to train a model is very large. A common example is economic data, and using its constituent stock price data, employment data, banking data, industrial data, and housing data together to predict the gross domestic product (GDP). Such types of data are said to have high dimensionality. Though they offer numerous features that can be used to model a given use case, high-dimensional datasets increase the computational complexity of machine learning algorithms, and more importantly may also result in over fitting. Over fitting is one of the results of the curse of dimensionality, which formally describes the ...

Get Machine Learning with Apache Spark Quick Start Guide now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.