Contact US

Home / / Datasets / / Cornell-Movie-Dialogue-Corpus

TextConversational

Cornell Movie Dialogue Corpus

Summary

The Cornell Movie Dialogs dataset is a rich set of movie character dialogues. It contains a large metadata-rich collection of fictional conversations extracted from raw movie scripts. The dataset includes 220,579 conversational exchanges between 10,292 pairs of movie characters, involving 9,035 characters from 617 movies.

Size

The Cornell Movie Dialogs dataset consists of 220,579 conversational exchanges. The data is stored in text format, and the total size of the dataset is approximately 9.5 MB.

Use cases

The dataset is widely used in natural language processing (NLP) research and applications. It is particularly suitable for training models for dialogue systems, conversation analysis, sentiment analysis, and character relationship modeling. Researchers and practitioners in the fields of machine learning, artificial intelligence, and linguistics can benefit from this dataset to develop and test various algorithms and models related to human language understanding and generation.

License

The dataset is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0). This license allows users to share and adapt the content for any purpose, even commercially, as long as appropriate credit is given, a link to the license is provided, and any changes made are indicated.

Download from source

https://www.kaggle.com/datasets/rajathmc/cornell-moviedialog-corpus

Solutions

  • AGIE Data Engine
  • Vector Database
  • LLM FineTuning
  • Monitoring and Observability
  • AI Guardrails

Copyright © 2023 AGIE AI Technology Pvt. Ltd. All rights reserved.