Classification is very similar to linear regression. The algorithms take vectors, and the algorithm object has various parameters to tweak the algorithm in order to fit the needs of an application. The returned model can be used to predict the class invoking the transform method. We will use the Titanic Dataset and predict who will survive. The Dataset has 15 fields, including age, gender, whether they have siblings/a spouse, parents sailing with them, the class they are in, and so forth.
Similar to regression, we load the CSV data using the
read.csv() method. The code file is
ML02v2.scala. We load the code and run the
ML02v2 object. The CSV data is loaded and we print the schema to verify:
val filePath = "/Users/ksankar/fdps-v3/" ...