In this section, we will cover the k-means clustering algorithm in depth. The k-means is a partitional clustering algorithm.

Let the set of data points (or instances) be as follows:

*D = {x*_{1}*, x*_{2}*, …, x*_{n}*}*, where

*xi = (xi*_{1}*, xi*_{2}*, …, xi*_{r}*)*, is a vector in a real-valued space *X ⊆ R*_{r,} and *r* is the number of attributes in the data.

The k-means algorithm partitions the given data into *k* clusters with each cluster having a center called a centroid.

*k* is specified by the user.

Given *k*, the k-means algorithm works as follows:

Algorithm k-means (*k*, *D*)

- Identify the
*k*data points as the initial centroids (cluster centers). - Repeat step 1.
- For each data point
*x ϵ D*do. - Compute the distance from
*x*to the centroid. - Assign
*x*to the closest centroid ...

Start Free Trial

No credit card required