Dimensionality reduction

Complex and noisy characteristics of textual data with high dimensions can be handled by dimensionality reduction techniques. These techniques reduce the dimension of the textual data while still preserving its underlying statistics. Though the dimensions are reduced, it is important to preserve the inter-document relationships. The idea is to have minimum number of dimensions, which can preserve the intrinsic dimensionality of the data.

A textual collection is mostly represented in the form of a term document matrix wherein we have the importance of each term in a document. The dimensionality of such a collection increases with the number of unique terms. If we were to suggest the simplest possible dimensionality reduction ...

Get Mastering Text Mining with R now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.