- Start a new project in IntelliJ or in an IDE of your choice. Make sure that the necessary JAR files are included.
- We will use a JSON data file named cars.json, which has been created for this example:
name,cityBears,ChicagoPackers,Green BayLions,DetroitVikings,Minnesota
- Set up the package location where the program will reside:
package spark.ml.cookbook.chapter4
- Import the necessary packages for the Spark session to get access to the cluster and log4j.Logger to reduce the amount of output produced by Spark:
import org.apache.spark.ml.Pipelineimport org.apache.spark.ml.classification.LogisticRegressionimport org.apache.spark.ml.feature.{HashingTF, Tokenizer}import org.apache.spark.sql.SparkSessionimport org.apache.log4j.{Level, ...