Home / / Datasets / / mmlu
This dataset is a colossal multitask test comprising multiple-choice questions from diverse branches of knowledge. The test spans subjects in the humanities, social sciences, hard sciences, and more. For a model to attain high accuracy on this test, it must possess extensive world knowledge and problem-solving ability. The dataset covers 57 tasks including elementary mathematics, US history, computer science, law, and many more. This dataset is intended to bridge the gap between the wide-ranging knowledge that models absorb during pretraining and the existing measures of success.
The dataset consists of 99842 examples in the 'auxiliary_train' split, 285 in the 'dev' split, 1531 in the 'val' split, and 14042 in the 'test' split, totalling 113700 examples. The size of the downloaded dataset files is approximately 9.47 GB, and the size of the auto-converted Parquet files is roughly 2.71 GB.
This dataset is pertinent for training and evaluating the performance of machine learning models, particularly in tasks that involve problem solving and require extensive world knowledge. The multiple-choice format lends itself to many natural language processing tasks, including but not limited to question answering, text classification, and information retrieval.
MIT License