How it works...

In the previous recipe, we observed that KMeans can discover and allocate membership to one and only one cluster based on an iterative method using similarity (Euclidian, and so on). One can think of KMeans as a specialized version of a Gaussian mixture model with EM models in which a discrete (hard) membership is enforced.

But there are cases that have overlap, which is often the case in medicine or signal processing, as depicted in the following figure:

In such cases, we need a probability density function that can express the membership in each sub-distribution. The Gaussian Mixture models with Expectation Maximization

Get Apache Spark 2.x Machine Learning Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.