Formatting the data

Amazon ML works on comma separated values files (.csv), a very simple format where each row is an observation and each column is a variable or attribute. There are, however, a few conditions that should be met:

  • The data must be encoded in plain text using a character set, such as ASCII, Unicode, or EBCDIC
  • All values must be separated by commas; if a value contains a comma, it should be enclosed by double quotes
  • Each observation (row) must be smaller than 100k

There are also conditions regarding end of line characters that separate rows. Special care must be taken when using Excel on OS X (Mac), as explained on this page: http://docs.aws.amazon.com/machine-learning/latest/dg/understanding-the-data-format-for-amazon-ml.html. ...

Get Effective Amazon Machine Learning now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.