Doing classification using Gradient Boosted Trees

Another ensemble learning algorithm is Gradient Boosted Trees (GBTs). GBTs train one tree at a time, where each new tree improves upon the shortcomings of previously trained trees.

As GBTs train one tree at a time, they can take longer than Random Forest.

Getting ready

We are going to use the same data we used in the previous recipe.

How to do it…

  1. Start the Spark shell:
    $ spark-shell
    
  2. Perform the required imports:
    scala> import org.apache.spark.mllib.tree.GradientBoostedTrees
    scala> import org.apache.spark.mllib.tree.configuration.BoostingStrategy
    scala> import org.apache.spark.mllib.util.MLUtils
    
  3. Load and parse the data:
    scala> val data =
      MLUtils.loadLibSVMFile(sc, "rf_libsvm_data.txt")
    
  4. Split the data ...

Get Spark Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.