An Amazon ML datasource is composed of the following:
- The location of the data file: The data file is not duplicated or cloned in Amazon ML but accessed from S3
- The schema that contains information on the type of the variables contained in the CSV file:
- Categorical
- Text
- Numeric (real-valued)
- Binary
As we will see in Chapter 4, Loading and Preparing the Dataset, it is possible to supply Amazon ML with your own schema or modify the one created by Amazon ML.
At this point, Amazon ML has a pretty good idea of the type of data in your training dataset. It has identified the different types of variables and knows how many rows it has:
Move on to the next step by clicking on Continue, and see what schema Amazon ML has ...