One of the key problems that need to be addressed while working on any machine learning model is: how accurate can this model be once it is implemented in production on a future dataset?
It is not possible to answer this question straight away. However, it is really important to obtain the buy-in from commercial teams that ultimately get benefited from the model build. Dividing the dataset into training and testing datasets comes in handy in such a scenario.
The training dataset is the data that is used to build the model. The testing dataset is the dataset that is not seen by the model; that is, the data points are not used in building the model. Essentially, one can think of the testing dataset as the dataset that is likely ...