Learning from text – Naive Bayes for Natural Language Processing

In this recipe, we show how to handle text data with scikit-learn. Working with text requires careful preprocessing and feature extraction. It is also quite common to deal with highly sparse matrices.

We will learn to recognize whether a comment posted during a public discussion is considered insulting to one of the participants. We will use a labeled dataset from Impermium, released during a Kaggle competition (see http://www.kaggle.com/c/detecting-insults-in-social-commentary).

How to do it...

Let's import our libraries:

>>> import numpy as np import pandas as pd import sklearn import sklearn.model_selection as ms import sklearn.feature_extraction.text as text import sklearn.naive_bayes ...

Get IPython Interactive Computing and Visualization Cookbook - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

IPython Interactive Computing and Visualization Cookbook - Second Edition by Cyrille Rossant

Learning from text – Naive Bayes for Natural Language Processing

How to do it...

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly