O'Reilly logo

Agile Data Science 2.0 by Russell Jurney

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 9. Improving Predictions

Now that we have deployed working models predicting flight delays, it is time to ‘make believe’ that our prediction has proven useful based on user feedback, and further that the prediction is valuable enough that prediction quality is important. In this case, it is time to iteratively improve the quality of our prediction. If a prediction is valuable enough, this becomes a full-time job for one or more people.

In this chapter we will tune our Spark ML classifier and also do additional feature engineering to improve prediction quality. In doing so, we will show you how to iteratively improve predictions.

Code examples for this chapter are available at https://github.com/rjurney/Agile_Data_Code_2/tree/master/ch09. Clone the repository and follow along!

git clone https://github.com/rjurney/Agile_Data_Code_2.git

Fixing our Prediction Problem

At this point we realized that our model was always predicting one class, no matter the imput. We began by investigating that in a Jupyter Notebook at ch09/Debugging Prediction Problems.ipynb.

The notebook itself is very long, and we tried many things to fix our model. It turned out we had made a mistake. We were using OneHotEncoder on top of the output of StringIndexerModel when we were encoding our nominal/categorical string features. This is how you should encode features for models other than decision trees, but it turns out that for decision tree models, you are supposed to take the string indexes from StringIndexerModel ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required