Streaming KMeans to classify data in near real-time

Spark streaming is a powerful facility which lets you combine near real time and batch in the same paradigm. The streaming KMeans interface lives at the intersection of ML clustering and Spark streaming, and takes full advantage of the core facilities provided by Spark streaming itself (for example, fault tolerance, exactly once delivery semantics, and so on).

Get Apache Spark 2.x Machine Learning Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.