- Include the following lines in a new Python file to add datasets:
from sklearn.datasets import fetch_20newsgroups category_mapping = {'misc.forsale': 'Sellings', 'rec.motorcycles': 'Motorbikes', 'rec.sport.baseball': 'Baseball', 'sci.crypt': 'Cryptography', 'sci.space': 'OuterSpace'} training_content = fetch_20newsgroups(subset='train', categories=category_mapping.keys(), shuffle=True, random_state=7)
- Perform feature extraction to extract the main words from the text:
from sklearn.feature_extraction.text import CountVectorizer vectorizing = CountVectorizer() train_counts = vectorizing.fit_transform(training_content.data) print "nDimensions of training data:", train_counts.shape
- Train the classifier:
from sklearn.naive_bayes ...