Euclidean distance

By definition, in order to cluster data points into groups, we require an understanding of the distance between two given data points. A common measure of distance is the Euclidean distance, which is simply the straight-line distance between two given points in k-dimensional space, where k is the number of independent variables or features. Formally, the Euclidean distance between two points, p and q, given k independent variables or dimensions is defined as follows:

Other common measures of distance include the Manhattan distance, which is the sum of the absolute values instead of squares ( ) and the maximum coordinate ...

Get Machine Learning with Apache Spark Quick Start Guide now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.