In this recipe, we will download the glass dataset and try to identify and label each glass using a bisecting KMeans algorithm. The Bisecting KMeans is a hierarchical version of the K-Mean algorithm implemented in Spark using the BisectingKMeans() API. While this algorithm is conceptually like KMeans, it can offer considerable speed for some use cases where the hierarchical path is present.
The dataset we used for this recipe is the Glass Identification Database. The study of the classification of types of glass was motivated by criminological research. Glass could be considered as evidence if it is correctly identified. The data can be found at NTU (Taiwan), already in LIBSVM format. ...