- Go to the LIBSVM Data: Classification (Multi-class) Repository and download the file: https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass/iris.scale
- Start a new project in IntelliJ or in an IDE of your choice. Make sure the necessary JAR files are included.
- Set up the package location where the program will reside:
package spark.ml.cookbook.chapter5
- Import the necessary packages for the SparkSession to gain access to the cluster and Log4j.Logger to reduce the amount of output produced by Spark:
import org.apache.spark.sql.SparkSessionimport org.apache.spark.ml.classification.{LogisticRegression, OneVsRest}import org.apache.spark.ml.evaluation.MulticlassClassificationEvaluatorimport org.apache.log4j.{ Level, ...