Classification

Classification is very similar to linear regression. The algorithms take vectors, and the algorithm object has various parameters to tweak the algorithm in order to fit the needs of an application. The returned model can be used to predict the class invoking the transform method. We will use the Titanic Dataset and predict who will survive. The Dataset has 15 fields, including age, gender, whether they have siblings/a spouse, parents sailing with them, the class they are in, and so forth.

Loading data

Similar to regression, we load the CSV data using the read.csv() method. The code file is ML02v2.scala. We load the code and run the ML02v2 object. The CSV data is loaded and we print the schema to verify:

val filePath = "/Users/ksankar/fdps-v3/" ...

Get Fast Data Processing with Spark 2 - Third Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.