- Start a new project in IntelliJ or in an IDE of your choice. Make sure the necessary JAR files are included.
- Set up the package location where the program will reside:
package spark.ml.cookbook.chapter8
- Import the necessary packages for Spark context to get access to the cluster and Log4j.Logger to reduce the amount of output produced by Spark:
import org.apache.log4j.{Level, Logger}import org.apache.spark.ml.clustering.KMeansimport org.apache.spark.sql.SparkSession
- Set the output level to ERROR to reduce Spark's logging output:
Logger.getLogger("org").setLevel(Level.ERROR)
- Create Spark's Session object:
val spark = SparkSession .builder.master("local[*]") .appName("myKMeansCluster") .config("spark.sql.warehouse.dir" ...