Python package – sklearn

Since Python sklearn is a very useful package, it is worthwhile to show you more examples of using this package. The example cited here is how to use the package to classify documents by topics using a bag-of-words approach.

This example uses a scipy.sparse matrix to store the features and demonstrates various classifiers that can efficiently handle sparse matrices. The dataset used in this example is the 20 newsgroups dataset. It will be automatically downloaded, then cached. The ZIP file contains the input files and can be downloaded at http://people.csail.mit.edu/jrennie/20Newsgroups/20news-bydate.tar.gz. It has a size of about 14 MB. The code is available at the following web link: http://scikit-learn.org/stable/auto_examples/text/document_classification_20newsgroups.html#sphx-glr-auto-examples-text-document-classification-20newsgroups-py ...

Get Hands-On Data Science with Anaconda now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.