Another important data mining technique is clustering.
Clustering is a way to find similar sets of observations in a data set;
groups of similar observations are called *clusters*.
There are several functions available for clustering in R.

To effectively use clustering algorithms, you need to
begin by measuring the distance between observations. A convenient way
to do this in R is through the function `dist`

in the `stats`

package:

dist(x, method = "euclidean", diag = FALSE, upper = FALSE, p = 2)

The `dist`

function computes the
distance between pairs of objects in another object, such as a matrix or
a data frame. It returns a distance matrix (an object of type `dist`

) containing the computed distances. Here
is a description of the arguments to `dist`

.

Argument | Description | Default |
---|---|---|

x | The object on which to compute distances. Must be a data
frame, matrix, or `dist`
object. | |

method | The method for computing distances. Specify `method="euclidean"` for Euclidean
distances (2-norm), `method="maximum"` for the maximum
distance between observations (supremum norm), `method="manhattan"` for the absolute
distance between two vectors (1-norm), `method="canberra"` for Canberra
distances (see the help file), `method="binary"` to regard nonzero
values as 1 and zeros as 0, or `method="minkowski"` to use the
p-norm (the pth root
of the sum of the pth powers of the
differences of the components). | `"euclidean"` |

diag | A logical value specifying whether the diagonal of the
distance matrix should be printed by `print.dist ...` |

Start Free Trial

No credit card required