-
Start a new project in IntelliJ or in an IDE of your choice. Make sure the necessary JAR files are included.
-
Set up the package location where the program will reside:
package spark.ml.cookbook.chapter8
- Import the necessary packages:
import org.apache.log4j.{Level, Logger}import org.apache.spark.sql.SparkSessionimport org.apache.spark.ml.clustering.LDA
-
We set up the necessary Spark Session to gain access to the cluster:
val spark = SparkSession .builder.master("local[*]") .appName("MyLDA") .config("spark.sql.warehouse.dir", ".") .getOrCreate()
-
We have a sample LDA dataset, which is located at the following relative path (you can use an absolute path). The sample file is provided with any Spark distribution and ...