O'Reilly logo

Building Machine Learning Systems with Python - Second Edition by Luis Pedro Coelho, Willi Richert

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Solving our initial challenge

We will now put everything together and demonstrate our system for the following new post that we assign to the new_post variable:

"Disk drive problems. Hi, I have a problem with my hard disk.

After 1 year it is working only sporadically now.

I tried to format it, but now it doesn't boot any more.

Any ideas? Thanks."

As you learned earlier, you will first have to vectorize this post before you predict its label:

>>> new_post_vec = vectorizer.transform([new_post])
>>> new_post_label = km.predict(new_post_vec)[0]

Now that we have the clustering, we do not need to compare new_post_vec to all post vectors. Instead, we can focus only on the posts of the same cluster. Let's fetch their indices in the original data set:

>>> similar_indices ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required