The steps for basic KMeans implementation (Lloyd algorithm) are:
- Randomly select K datacenters from observations as the initial centroids.
- Keep iterating till the convergence criteria is met:
- Measure the distance from a point to each centroid
- Include each data point in a cluster which is the closest centroid
- Calculate new cluster centroids based on a distance formula (proxy for dissimilarity)
- Update the algorithm with new center points
The three generations are depicted in the following figure: