The dataset

We will use the sample dataset that comes by default in ML Studio. This is the subset of the passenger flight's on-time performance data taken from the TranStats data collection from U.S. Department of Transportation (DOT) (http://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=236&DB_Short_Name=On-Time).

The dataset has been preprocessed and is filtered to include only the 70 busiest airports in the continental United States for the period between April 2013 to October 2013. It contains the following columns:

  • Carrier: This contains the code assigned by IATA and is commonly used to identify a carrier.
  • OriginAirportID (Origin Airport's Airport ID): This is an identification number assigned by DOT to identify a unique airport.
  • DestAirportID ...

Get Microsoft Azure Machine Learning now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.