O'Reilly logo

Fast Data Processing with Spark 2 - Third Edition by Krishna Sankar

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Classification

Classification is very similar to linear regression. The algorithms take vectors, and the algorithm object has various parameters to tweak the algorithm in order to fit the needs of an application. The returned model can be used to predict the class invoking the transform method. We will use the Titanic Dataset and predict who will survive. The Dataset has 15 fields, including age, gender, whether they have siblings/a spouse, parents sailing with them, the class they are in, and so forth.

Loading data

Similar to regression, we load the CSV data using the read.csv() method. The code file is ML02v2.scala. We load the code and run the ML02v2 object. The CSV data is loaded and we print the schema to verify:

val filePath = "/Users/ksankar/fdps-v3/" ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required