Now, let's try the preceding model with TF-IDF, as another step after bag-of-words (unigrams), as follows:
mnb_clf = Pipeline([('vect', CountVectorizer()), ('tfidf', TfidfTransformer()), ('clf',MNB())])mnb_clf.fit(X=X_train, y=y_train)mnb_acc, mnb_predictions = imdb_acc(mnb_clf)mnb_acc # 0.82956
This is better than our previous value, but let's see what else we can do to improve this further.