Changing fit prior to false

Increasing ngram_range did work for us, but changing prior from uniform to fitting it (by changing fit_prior to False) did not help at all, as follows:

mnb_clf = Pipeline([('vect', CountVectorizer(stop_words='english', ngram_range=(1,3))), ('tfidf', TfidfTransformer()), ('clf',MNB(fit_prior=False))])mnb_clf.fit(X=X_train, y=y_train)mnb_acc, mnb_predictions = imdb_acc(mnb_clf)mnb_acc # 0.8572

We have now thought of each combination that might improve our performance. Note that this approach is tedious, and also error-prone because it relies too greatly on human intuition.

Get Natural Language Processing with Python Quick Start Guide now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.