An example of a document clustering application
This application will read a set of documents and will organize them using the k-means clustering algorithms. To achieve this, we will use four components:
- The Reader system: This system will read all the documents and convert every document into a list of
- The Indexer system: This system will process the documents and convert them into a list of words. At the same time, it will generate the global vocabulary of the set of documents with all the words that appear on them.
- The Mapper system: This system will convert each list of words into a mathematical representation using the vector space model. The value of each item will be the Tf-Idf (short for term frequency–inverse document frequency ...