Spark ML provides a rich set of tools and models for training, scoring, evaluating, and exporting machine learning models. This video walks you through each step in the process. You’ll explore the basics of Spark’s DataFrames, Transformer, Estimator, Pipeline, and Parameter, and how to utilize the Spark API to create model uniformity and comparability. You'll learn how to create meaningful models and labels from a raw dataset; train and score a variety of models; target price predictions; compare results using MAE, MSE, and other scores; and employ the SparkML evaluator to automate the parameter-tuning process using cross validation. To complete the lesson, you'll learn to export and serialize a Spark trained model as PMML (an industry standard for model serialization), so you can deploy in applications outside the Spark cluster environment.
Hollin Wilkins is the cofounder of Combust, Inc., an ML/AI start-up in the SF Bay Area. A data scientist and software engineer formerly with True Car, Hollin has worked with machine learning, high-performance microservices, and software development since 2011.
Jason Slepicka is a senior data engineer with DataScience, where he builds pipelines and data science platform infrastructure. Jason is working on his PhD in Computer Science at the University of Southern California Information Sciences Institute.