The final thing
As we mentioned earlier, one of the interesting additions to spark 2.0.0 is the ML pipeline. A pipeline is nothing but a linear graph of transformers and estimators. If we look at the classes we have been using, they are either transformers or estimators. We had a decent pipeline for our classification example, as follows:
We started with Passengers, which was the Dataset that we read in.
- Passengers1 was after the feature extraction.
- Passenders2 was after
StringIndexer
. - Passengers3 was after the
na.drop()
function. - Passengers4 was after the
VectorAssembler()
function. - The
algTree
object was the algorithm object.
We would have created a pipeline:
valtreePipeline = new Pipeline().setStages(Array(indexer, assembler, algTree))
Then, we would ...
Get Fast Data Processing with Spark 2 - Third Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.