The method builds on the observation that the best clusters generated using k-means are those named with a single dictionary term. It starts by finding such terms and building clusters around them, instead of vice-versa. The problem is reduced to one of finding a way to identify the best terms for the purpose of generating clusters. For this, we use the cohesion metric. We define the cohesion of a term to be the average distance of every document that contains that term from a centroid formed by the average of all documents that contain the term.
Simply put, cohesive terms are those which have the property that the documents ...
- 2. Mining Customer Interactions
- from Mining the Talk: Unlocking the Business Value in Unstructured Information
- Publisher: IBM Press
- Released: July 2007
Key term / method - Intuitive Clustering, cohesion, centroid
Share this highlighthttp://www.safaribooksonline.com/a/mining-the-talk/1656/