ML pipelines for real-life machine learning applications

This is the first of two recipes which cover the ML pipeline in Spark 2.0. For a more advanced treatment of ML pipelines with additional details such as API calls and parameter extraction, see later chapters in this book.

In this recipe, we attempt to have a single pipeline that can tokenize text, use HashingTF (an old trick) to map term frequencies, run a regression to fit a model, and then predict which group a new term belongs to (for example, news filtering, gesture classification, and so on).

Get Apache Spark 2.x Machine Learning Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.