Adding TF-IDF

Now, let's try the preceding model with TF-IDF, as another step after bag-of-words (unigrams), as follows:

mnb_clf = Pipeline([('vect', CountVectorizer()), ('tfidf', TfidfTransformer()), ('clf',MNB())]), y=y_train)mnb_acc, mnb_predictions = imdb_acc(mnb_clf)mnb_acc # 0.82956

This is better than our previous value, but let's see what else we can do to improve this further.

