Bisecting KMeans, the new kid on the block in Spark 2.0

In this recipe, we will download the glass dataset and try to identify and label each glass using a bisecting KMeans algorithm. The Bisecting KMeans is a hierarchical version of the K-Mean algorithm implemented in Spark using the BisectingKMeans() API. While this algorithm is conceptually like KMeans, it can offer considerable speed for some use cases where the hierarchical path is present.

The dataset we used for this recipe is the Glass Identification Database. The study of the classification of types of glass was motivated by criminological research. Glass could be considered as evidence if it is correctly identified. The data can be found at NTU (Taiwan), already in LIBSVM format. ...

Get Apache Spark 2.x Machine Learning Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.