As we saw in Chapter 2, Machine Learning Definitions and Concepts, in order to build and select the best model, we need to split the dataset into three parts: training, validation, and test, with the usual ratios being 60%, 20%, and 20%. The training and validation sets are used to build several models and select the best one while the held-out set is used for the final performance evaluation on previously unseen data. We will use the held-out subset in Chapter 6, Predictions and Performances to simulate batch predictions with the model we build in Chapter 5, Model Creation.
Since Amazon ML does the job of splitting the dataset used for model training and model evaluation into training and validation subsets, we only need ...