O'Reilly logo

Principles of Data Science by Sinan Ozdemir

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Feature extraction and principal component analysis

Sometimes we have an overwhelming number of columns and likely not enough rows to handle the great quantity of columns.

A great example of this is when we were looking at the send cash now example in our Naïve Bayes example. We had literally 0 instances of texts with that exact phrase, so instead we turned to a naïve assumption that allowed us to extrapolate a probability for both of our categories.

The reason we had this problem in the first place is because of something called the curse of dimensionality.

The curse of dimensionality basically says that as we introduce and consider new feature columns, we need almost exponentially more rows (data points) in order to fill in the empty spaces that ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required