- Start a new project in IntelliJ or in an IDE of your choice. Make sure that the necessary JAR files are included.
- Set up the package location where the program will reside:
package spark.ml.cookbook.chapter13
- Import the necessary packages:
import java.util.concurrent.TimeUnitimport org.apache.log4j.{Level, Logger}import org.apache.spark.sql.SparkSessionimport org.apache.spark.sql.streaming.ProcessingTime
- Create a SparkSession as an entry point to the Spark cluster:
val spark = SparkSession.builder.master("local[*]").appName("DataFrame Stream").config("spark.sql.warehouse.dir", ".").getOrCreate()
- The interleaving of log messages leads to hard-to-read output, therefore set logging level to warning:
Logger.getLogger( ...