Partitioning a dataset in training and test sets

We briefly introduced the concept of partitioning a dataset into separate datasets for training and testing in Chapter 1, Giving Computers the Ability to Learn from Data, and Chapter 3, A Tour of Machine Learning Classifiers Using Scikit-learn. Remember that the test set can be understood as the ultimate test of our model before we let it loose on the real world. In this section, we will prepare a new dataset, the Wine dataset. After we have preprocessed the dataset, we will explore different techniques for feature selection to reduce the dimensionality of a dataset.

The Wine dataset is another open-source dataset that is available from the UCI machine learning repository (https://archive.ics.uci.edu/ml/datasets/Wine ...

Get Python: Real-World Data Science now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.