Examining data statistics

When Amazon ML created the data source, it carried out a basic statistical analysis of the different variables. For each variable, it estimated the following information:

  • Correlation of each attribute to the target
  • Number of missing values
  • Number of invalid values
  • Distribution of numeric variables with histogram and box plot 
  • Range, mean, and median for numeric variables
  • Most and least frequent categories for categorical variables
  • Word counts for text variables
  • Percentage of true values for binary variables

Go to the Datasource dashboard, and click on the new datasource you just created in order to access the data summary page. The left side menu lets you access data statistics for the target and different attributes, ...

Get Effective Amazon Machine Learning now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.