Home / / Datasets / / MultiWOZ
The MultiWOZ v2.2 dataset is a collection of dialogues that cover various tasks and sub-tasks. It includes dialogue modeling, multi-class classification, and parsing. The dataset is monolingual and is available in English. It consists of dialogues that involve services like restaurants, hotels, taxis, trains, buses, police, attractions, and hospitals. The dialogues are structured with different dialogue acts, slots, and values, providing a rich context for various natural language processing tasks.
It is divided into different subsets, including v2.2 (10.4k rows) and v2.2_active_only (10.4k rows). The splits include training (8.44k rows), validation (1k rows), and testing (1k rows).
The MultiWOZ v2.2 dataset can be used for various tasks such as text generation, fill-mask, token classification, and more. It is particularly suitable for dialogue modeling, where the goal is to understand and generate human-like dialogues. The dataset can be employed in building conversational AI systems, chatbots, and virtual assistants that can handle complex dialogues across different domains like restaurants, hotels, transportation, and emergency services.
The MultiWOZ v2.2 dataset is licensed under the Apache-2.0 License.