This video explains text mining along with how it is performed. There are eight modules in this video:
- Text Mining Overview. In this first clip, we define text mining and provide examples, such as spam filtering, review analysis, and sentiment analysis.
- Related Disciplines. In this second clip, we distinguish text mining from Information Retrieval, Information Extraction, and Natural Language Processing (NLP).
- Text Mining Sources. In this third clip, we define the sources of text (textual data) and explore, structured documents, semi structures documents, and unstructured documents.
- Text Mining Process. In this fourth clip, we provide an overview to the steps in performing text mining, including Given Data (Text), Text Preprocessing, Feature Generation/Extraction, Feature Selection, Text Mining Methods, and Results Evaluation.
- Text Preprocessing. In this fifth clip, we cover Text Preprocessing including the two methods of Lexical Analysis and Syntactic Analysis.
- Feature Extraction. In this sixth clip, we explore Feature Extraction including Stop Word Elimination, Stemming, and Lemmatization.
- Weighting Models. In this seventh clip, we describe how to transform bag of words to a vectorial representation so that we can use it in text mining algorithms for further processing like document classification. We cover Boolean Model, Term Frequency (TF), and Term Frequency Inverse Document Frequency (TFIDF).
- Dimension Reduction. In this eighth clip, we explore dimension reduction, which is reducing the size of the vocabulary to avoid the curse of dimensionality. We cover Latent Semantic Analysis techniques which are widely used in text mining for dimension reduction. We provide an example of using the K nearest neighbor algorithm.