Splitting data

Quite often, you would need to split your dataset; most commonly, you would need to split a given dataset for analysis into train and test dataset. ML Studio comes with a Split module for this purpose. It lets you split your dataset into two datasets based on a specified fraction. So, if you choose 0.8, it outputs the first dataset with 80 percent of the input dataset, and the rest 20 percent as second output. You also have an option to split the data randomly. You can specify a random seed value other than 0 if you need to get the same result in a random split every time you run it. You can find the Split module under Data Transformation | Sample, and then Split it in the module palette:

Notice that the last parameter, Stratified ...

Get Microsoft Azure Machine Learning now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.