© Dipanjan Sarkar 2016

Dipanjan Sarkar, Text Analytics with Python, 10.1007/978-1-4842-2388-8_6

6. Text Similarity and Clustering

Dipanjan Sarkar

(1)Bangalore, Karnataka, India

Previous chapters have covered several techniques of analyzing text and extracting interesting insights. We have looked at supervised machine learning (ML) techniques that are used to classify or categorize text documents into several pre-assumed categories. Unsupervised techniques like topic models and document summarization have also been also covered, which involved trying to extract and retrieve key themes and information from large text documents and corpora. In this chapter, we will be looking at several other techniques and use-cases that leverage unsupervised learning ...

Get Text Analytics with Python: A Practical Real-World Approach to Gaining Actionable Insights from your Data now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.