Unit 48Designing a Predictive Experiment

Predictive data analysis is a real scientific experiment, and it must be organized as such. You cannot just claim that your data mode predicts something—an important part of an experiment is the assessment and validation of its predictive power.

To build, assess, and validate a model, follow these four steps:

  1. Split the input data into training and testing sets (the recommended split ratio is 70:30). Then set the testing data aside and never use it for preparing the data model.

  2. Build a data model using only the training data.

  3. Apply the new model to the testing data.

  4. Evaluate the model quality with the confusion matrix or some other quality assurance tool. If the model passes the test, it is adequate. ...

Get Data Science Essentials in Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.