Summary

We have now identified clusters using a range of methods, from calculating simple centroids manually to advanced hierarchical clustering algorithms in SciPy. There are of course many more packages in Python. We will look at one alternative, the machine learning package, Scikit-learn, in Chapter 7 Supervised and Unsupervised Learning, to identify clusters. SciPy has these two clustering frameworks, that is, vector quantization and hierarchical clustering, which lay the foundation for cluster analysis and are very useful in many general data analysis problems. In the next chapter, we will look at Bayesian analysis and how to use the PyMC Bayesian inference package in Python to characterize various things in data.

Get Mastering Python Data Analysis now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.