O'Reilly logo
  • Eric Wayne thinks this is interesting:

The method builds on the observation that the best clusters generated using k-means are those named with a single dictionary term. It starts by finding such terms and building clusters around them, instead of vice-versa. The problem is reduced to one of finding a way to identify the best terms for the purpose of generating clusters. For this, we use the cohesion metric. We define the cohesion of a term to be the average distance of every document that contains that term from a centroid formed by the average of all documents that contain the term.

Simply put, cohesive terms are those which have the property that the documents ...


Cover of Mining the Talk: Unlocking the Business Value in Unstructured Information


Key term / method - Intuitive Clustering, cohesion, centroid