Evaluation
MMLU
Quick Answer
Massive Multitask Language Understanding: a broad benchmark covering 57 academic subjects.
MMLU is a comprehensive knowledge benchmark covering 57 subjects (science, history, law, medicine, etc.). It includes multiple-choice questions from high school through professional exams. MMLU requires broad knowledge and reasoning. Performance on MMLU correlates with general capability. MMLU is widely used for comparing models. State-of-the-art models achieve ~90% accuracy. MMLU has limitations—it tests knowledge recall more than reasoning. MMLU remains a standard evaluation metric.
Last verified: 2026-04-08