Managing schema and recipe

Removing or adding features to a dataset directly impacts the schema and the recipe. The schema is used when creating the datasources, while the recipe is needed to train the model, as it specifies which data transformation will be performed prior to the model training.

Modifying the schema to remove features from the dataset can be done by simply adding the names of the variable to the excludedAttributeNames field. We can take the initial schema, and each time we remove a feature from the initial feature list, we add it to the excludedAttributeNames list. The steps are as follows:

  1. Open the JSON formatted schema into a schema dict
  2. Append the feature name to schema ['excludedAttributeNames']
  3. Save the schema to a ...

Get Effective Amazon Machine Learning now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.