O'Reilly logo

Learning Predictive Analytics with Python by Ashish Kumar

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Mathematics behind clustering

Earlier in this chapter, we discussed how a measure of similarity or dissimilarity is needed for the purpose of clustering observations. In this section, we will see what those measures are and how they are used.

Distances between two observations

If we consider each observation as a point in an n-dimensional space, where n is the number of columns in the dataset, one can calculate the mathematical distance between the points. The lesser the distance, the more similar they are. The points that are less distant to each other will be clubbed together.

Now, there are many ways of calculating distances and different algorithms use different methods of calculating distance. Let us see the different methods with a few examples. ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required