How to do it...

  1. Include the following lines in a new Python file to add datasets:
from sklearn.datasets import fetch_20newsgroups 
category_mapping = {'misc.forsale': 'Sellings', 'rec.motorcycles': 'Motorbikes', 
        'rec.sport.baseball': 'Baseball', 'sci.crypt': 'Cryptography', 
        'sci.space': 'OuterSpace'} 
 
training_content = fetch_20newsgroups(subset='train', 
categories=category_mapping.keys(), shuffle=True, random_state=7) 
  1. Perform feature extraction to extract the main words from the text:
from sklearn.feature_extraction.text import CountVectorizer 
 
vectorizing = CountVectorizer() 
train_counts = vectorizing.fit_transform(training_content.data) 
print "nDimensions of training data:", train_counts.shape 
  1. Train the classifier:
from sklearn.naive_bayes ...

Get Raspberry Pi 3 Cookbook for Python Programmers - Third Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.