LLM Benchmark Comparison 2026

Compare benchmark scores across 92 large language models. Sort by Arena ELO, Coding ELO, HumanEval, MMLU, MATH, and GPQA. Click any column to sort, or explore individual leaderboards below.

Benchmark Rankings

92 of 92 models
ModelArena ELOCoding ELOMMLU
Claude Opus 41503150391.5
Gemini 2.5 Pro1430143092
o31340132095
DeepSeek R11310133089
o11310131092
Qwen 3 235B MoE1310129088
Gemini Experimental 120613001280--
GPT-4.51290129589
DeepSeek R1 (Groq)1290127085
Llama 4 Maverick1290128088
DeepSeek R1 (Together)1290127085
Grok 31285128087
Claude Sonnet 41280130588.7
DeepSeek V31280130087.5
Gemini 2.0 Flash Thinking12801270--
o3-mini1280128586
Claude 3.5 Sonnet1270130088.3
Gemini 2.5 Flash1270126086
o1-mini1270128085
ChatGPT-4o Latest1265127089
Gemini 2.0 Flash1260124085.5
GPT-4o1260126588.7
o4-mini1260127085
Qwen 2.5 Max1260125086
QwQ 32B12601250--
GPT-4o (Aug 2024)1255126588.7
DeepSeek R1 Distill Llama 70B1250124086
Llama 4 Scout1250123085
Mistral Large1245124086.5
Command A1240123085
DeepSeek R1 Distill Qwen 32B1240123585
Llama 3.1 405B (Fireworks)1240120085.9
GPT-4 Turbo1240124586
Grok 21240122585
Llama 3.1 405B1240120085.9
Sonar Reasoning1240123585
Llama 3.1 405B (Together)1240120085.9
Gemini 1.5 Pro1230--87
Grok 2 Vision1230----
Pixtral Large1230----
Qwen 2.5 72B1230121085
Qwen 2.5 72B (Together)1230121085
Amazon Nova Pro1220121085
Claude 3.5 Haiku1220125083
Claude Haiku 41220119583
Llama 3.3 70B (Fireworks)1220118086.2
GPT-4o Mini1220120082
Llama 3.3 70B (Groq)1220118086.2
Llama 3.3 70B1220118086.2
Mistral Medium 31220--84
Llama 3.3 70B (Together)1220118086.2
Llama 3.2 90B Vision1210--84
Command R+1200116082
DeepSeek V2.51200122084
Mixtral 8x22B (Fireworks)1200122083
Gemini 2.0 Flash Lite1200117080
GPT-4 11200121086
Sonar Pro1200119584
WizardLM-2 8x22B1200122083
Llama 3.1 70B1195--85.2
Phi-3.5 MoE1195119082
Gemini 1.5 Flash1190--82
Gemma 2 27B1190119581
Mistral Small1185116079
Yi-Large1185119582
GPT-4 1.5-mini1180120082
Grok 3-mini1175119082
Amazon Nova Lite1170116080
Gemma 2 9B (Groq)1170116077
Phi-3 Medium1170117583
Yi-Lightning1165117080
Gemma 2 9B1160115577
Mixtral 8x7B (Groq)1160115074
Llama 3.2 11B Vision1160--80
Phi-3.5 Mini1160116581
Qwen 2.5 7B1160--79
Sonar1160117080
InternLM 2.5 20B1155115078
Gemini 1.5 Flash 8B1150--78
GPT-4 1.5-nano1150116078
Phi-41150113080.5
Command R1140110075.5
Mistral Nemo 12B1140--72
Amazon Nova Micro1130110076
Command R7B1120110072
GPT-3.5 Turbo1120110070
Llama 3.1 8B (Groq)1120--79
Llama 3.1 8B1120--79
Mistral 7B1100--62
Mistral 7B (Together)1100109062
Codestral 22B--1250--
Qwen 2.5 Coder 32B--1290--