- Start a new project in IntelliJ or in an IDE of your choice. Make sure that the necessary JAR files are included.
- Set up the package location where the program will reside:
package spark.ml.cookbook.chapter6
- Import the necessary packages for SparkSession to gain access to the cluster and Log4j.Logger to reduce the amount of output produced by Spark:
import org.apache.spark.mllib.regression.{LabeledPoint, LassoWithSGD, LinearRegressionWithSGD, RidgeRegressionWithSGD} import org.apache.spark.sql.{SQLContext, SparkSession} import org.apache.spark.ml.classification.LogisticRegression import org.apache.spark.ml.tuning.{ParamGridBuilder, TrainValidationSplit} import org.apache.spark.mllib.linalg.{Vector, Vectors}import org.apache.log4j.Logger ...