- Start a new project in IntelliJ or in an IDE of your choice. Make sure the necessary JAR files are included.
- Set up the package location where the program will reside:
package spark.ml.cookbook.chapter5
- Import the necessary packages for SparkSession to gain access to the cluster and log4j.Logger to reduce the amount of output produced by Spark:
import org.apache.spark.sql.SparkSessionimport scala.math._import org.apache.log4j.Loggerimport org.apache.log4j.Level
- Initialize a SparkSession specifying configurations with the builder pattern thus making an entry point available for the Spark cluster:
val spark = SparkSession .builder .master("local[4]") .appName("myRegress01_20") .config("spark.sql.warehouse.dir", ".") ...