KMeans (Lloyd Algorithm)

The steps for basic KMeans implementation (Lloyd algorithm) are:

  1. Randomly select K datacenters from observations as the initial centroids.
  2. Keep iterating till the convergence criteria is met:
    • Measure the distance from a point to each centroid
    • Include each data point in a cluster which is the closest centroid
    • Calculate new cluster centroids based on a distance formula (proxy for dissimilarity)
    • Update the algorithm with new center points

The three generations are depicted in the following figure:

Get Apache Spark 2.x Machine Learning Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.