Understanding datasets

In order to develop a chatbot, we are using two datasets. These datasets are as follows:

  • Cornell Movie-Dialogs dataset
  • bAbI dataset

Cornell Movie-Dialogs dataset

This dataset has been widely used for developing chatbots. You can download the Cornell Movie-Dialogs corpus from this link: https://www.cs.cornell.edu/~cristian/Cornell_Movie-Dialogs_Corpus.html. This corpus contains a large metadata-rich collection of fictional conversations extracted from raw movie scripts.

This corpus has 220,579 conversational exchanges between 10,292 pairs of movie characters. It involves 9,035 characters from 617 movies. In total, it has 304,713 utterances. This dataset also contains movie metadata. There are the following types of metadata:

Get Machine Learning Solutions now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.