- Start a new project in IntelliJ or in an IDE of your choice. Make sure that the necessary JAR files are included.
- Set up the package location where the program will reside:
package spark.ml.cookbook.chapter4
- Import the necessary packages for the Spark session to gain access to the cluster and log4j.Logger to reduce the amount of output produced by Spark:
import org.apache.spark.mllib.linalg.Vectorsimport org.apache.spark.mllib.stat.Statisticsimport org.apache.spark.sql.SparkSessionimport org.apache.log4j.Loggerimport org.apache.log4j.Level
- Set the output level to ERROR to reduce Spark's logging output:
Logger.getLogger("org").setLevel(Level.ERROR)Logger.getLogger("akka").setLevel(Level.ERROR)
- Initialize a Spark session ...