Building a text classifier
The goal of text classification is to categorize text documents into different classes. This is an extremely important analysis technique in NLP. We will use a technique, which is based on a statistic called tf-idf, which stands for term frequency—inverse document frequency. This is an analysis tool that helps us understand how important a word is to a document in a set of documents. This serves as a feature vector that's used to categorize documents. You can learn more about it at http://www.tfidf.com.
How to do it…
- Create a new Python file, and import the following package:
from sklearn.datasets import fetch_20newsgroups
- Let's select a list of categories and name them using a dictionary mapping. These categories are available ...
Get Python: Real World Machine Learning now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.