Creating the datasource

An Amazon ML datasource is composed of the following:

  • The location of the data file: The data file is not duplicated or cloned in Amazon ML but accessed from S3
  • The schema that contains information on the type of the variables contained in the CSV file:
    • Categorical
    • Text
    • Numeric (real-valued)
    • Binary

As we will see in Chapter 4, Loading and Preparing the Dataset, it is possible to supply Amazon ML with your own schema or modify the one created by Amazon ML.

At this point, Amazon ML has a pretty good idea of the type of data in your training dataset. It has identified the different types of variables and knows how many rows it has:

Move on to the next step by clicking on Continue, and see what schema Amazon ML has ...

Get Effective Amazon Machine Learning now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.