Documentation for more multivariate statistical summary:
- Pipeline docs are available at https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.ml.Pipeline
- Pipeline model that is useful when we load and save the .load(), .save() methods: https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.ml.PipelineModel
- Pipeline stage information is available at https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.ml.PipelineStage
- HashingTF, a nice old trick to map a sequence to their term frequency in text analytics is available at https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.mllib.feature.HashingTF