LabeledPoint data structure for Spark ML

LabeledPoint is a data structure that has been around since the early days for packaging a feature vector along with a label so it can be used in unsupervised learning algorithms. We demonstrate a short recipe that uses LabeledPoint, the Seq data structure, and DataFrame to run a logistic regression for binary classification of the data. The emphasis here is on LabeledPoint, and the regression algorithms are covered in more depth in Chapter 5Practical Machine Learning with Regression and Classification in Spark 2.0 - Part I and Chapter 6Practical Machine Learning with Regression and Classification in Spark 2.0 - Part II.

Get Apache Spark 2.x Machine Learning Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.