We need to do some manipulation on the new Titanic dataset before we upload it to S3 and create a new datasource in Amazon ML:
- Open this new Titanic dataset in your favorite editor.
- Select the first 1047 rows, and save them to a new CSV: ext_titanic_training.csv.
- Select the next 263 rows and the header row, and save them to a file ext_titanic_heldout.csv.
We need to update our schema. Open the schema file titanic_training.csv.schema, and add the following lines to the JSON:
{ "attributeName" : "is_age_missing", "attributeType" : "BINARY" }, { "attributeName" : "log_fare", "attributeType" : "NUMERIC" }, { "attributeName" : "title", "attributeType" : "CATEGORICAL" }, { "attributeName" : "deck", "attributeType" ...