Home / / Datasets / / Cornell-Movie-Dialogue-Corpus
The Cornell Movie Dialogs dataset is a rich set of movie character dialogues. It contains a large metadata-rich collection of fictional conversations extracted from raw movie scripts. The dataset includes 220,579 conversational exchanges between 10,292 pairs of movie characters, involving 9,035 characters from 617 movies.
The Cornell Movie Dialogs dataset consists of 220,579 conversational exchanges. The data is stored in text format, and the total size of the dataset is approximately 9.5 MB.
The dataset is widely used in natural language processing (NLP) research and applications. It is particularly suitable for training models for dialogue systems, conversation analysis, sentiment analysis, and character relationship modeling. Researchers and practitioners in the fields of machine learning, artificial intelligence, and linguistics can benefit from this dataset to develop and test various algorithms and models related to human language understanding and generation.
The dataset is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0). This license allows users to share and adapt the content for any purpose, even commercially, as long as appropriate credit is given, a link to the license is provided, and any changes made are indicated.