To validate the result, we must set up a Delphi technique in which the test data is absolutely unknown to the model. See Kaggle competitions for details at https://www.kaggle.com/competitions.
Three types of datasets are needed for a robust ML system:
- Training dataset: This is used to fit a model to sample
- Validation dataset: This is used to estimate the delta or prediction error for the fitted model (trained by training set)
- Test dataset: This is used to assess the model generalization error once a final model is selected