Chapter 13

Document Clustering: The Next Frontier

David C. Anastasiu

University of MinnesotaMinneapolis, MNanast021@umn.edu

Andrea Tagarelli

University of CalabriaArcavacata di Rende, Italytagarelli@deis.unical.it

George Karypis

University of MinnesotaMinneapolis, MNkarypis@cs.umn.edu

13.1 Introduction

The proliferation of documents, on both the Web and in private systems, makes knowledge discovery in document collections arduous. Clustering has been long recognized as a useful tool for the task. It groups like-items together, maximizing intra-cluster similarity and inter-cluster distance. Clustering can provide insight into the make-up of a document collection and is often used as the initial step in data analysis.

While most document clustering ...

Get Data Clustering now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.