In this example, we are going to use the sinusoidal dataset previously shown. The first step is creating it (with 1,000 samples):
import numpy as npfrom sklearn.preprocessing import StandardScalernb_samples = 1000X = np.zeros(shape=(nb_samples, 2))for i in range(nb_samples): X[i, 0] = float(i) if i % 2 == 0: X[i, 1] = 1.0 + (np.random.uniform(0.65, 1.0) * np.sin(float(i) / 100.0)) else: X[i, 1] = 0.1 + (np.random.uniform(0.5, 0.85) * np.sin(float(i) / 100.0)) ss = StandardScaler()Xs = ss.fit_transform(X)
At this point, we can try to cluster it using K-means (with n_clusters=2):
from sklearn.cluster import KMeanskm = KMeans(n_clusters=2, random_state=1000)Y_km = km.fit_predict(Xs)
The result ...