ELO Rating90 models ranked

Reasoning ELO Leaderboard 2026

Reasoning ELO is the Chatbot Arena leaderboard filtered to hard reasoning and math problems. It measures how well models solve multi-step logic, quantitative reasoning, and complex problem-solving.

Quick Answer

The best model on Reasoning ELO in 2026 is Claude Opus 4 by Anthropic, scoring 1503 ELO. Runner-up: Gemini 2.5 Pro (1430).

90 / 90 models
#ModelScore
🥇Claude Opus 41503ELO
🥈Gemini 2.5 Pro1430ELO
🥉o31350ELO
4DeepSeek R11350ELO
5o11330ELO
6Qwen 3 235B MoE1320ELO
7Gemini Experimental 12061310ELO
8DeepSeek R1 (Groq)1300ELO
9DeepSeek R1 (Together)1300ELO
10Grok 31295ELO
11o3-mini1295ELO
12GPT-4.51290ELO
13Gemini 2.0 Flash Thinking1290ELO
14o1-mini1280ELO
15Llama 4 Maverick1275ELO
16Claude Sonnet 41275ELO
17o4-mini1275ELO
18Claude 3.5 Sonnet1270ELO
19Gemini 2.5 Flash1270ELO
20QwQ 32B1270ELO
21ChatGPT-4o Latest1265ELO
22DeepSeek V31260ELO
23DeepSeek R1 Distill Llama 70B1260ELO
24GPT-4o (Aug 2024)1255ELO
25GPT-4o1250ELO
26DeepSeek R1 Distill Qwen 32B1250ELO
27Llama 3.1 405B (Fireworks)1250ELO
28Llama 3.1 405B1250ELO
29Sonar Reasoning1250ELO
30Llama 3.1 405B (Together)1250ELO
31Qwen 2.5 Max1240ELO
32Command A1240ELO
33GPT-4 Turbo1240ELO
34Grok 21240ELO
35Gemini 2.0 Flash1230ELO
36Mistral Large1230ELO
37Gemini 1.5 Pro1230ELO
38Grok 2 Vision1230ELO
39Pixtral Large1230ELO
40Qwen 2.5 72B1230ELO
41Qwen 2.5 72B (Together)1230ELO
42Llama 4 Scout1220ELO
43Amazon Nova Pro1220ELO
44Claude 3.5 Haiku1220ELO
45Llama 3.3 70B (Fireworks)1220ELO
46Llama 3.3 70B (Groq)1220ELO
47Llama 3.3 70B1220ELO
48Mistral Medium 31220ELO
49Llama 3.3 70B (Together)1220ELO
50Llama 3.2 90B Vision1210ELO
51Sonar Pro1210ELO
52DeepSeek V2.51200ELO
53Mixtral 8x22B (Fireworks)1200ELO
54GPT-4 11200ELO
55WizardLM-2 8x22B1200ELO
56Llama 3.1 70B1195ELO
57Phi-3.5 MoE1195ELO
58Gemini 1.5 Flash1190ELO
59Gemma 2 27B1190ELO
60Claude Haiku 41185ELO
61Yi-Large1185ELO
62GPT-4o Mini1180ELO
63GPT-4 1.5-mini1180ELO
64Grok 3-mini1175ELO
65Command R+1170ELO
66Amazon Nova Lite1170ELO
67Gemma 2 9B (Groq)1170ELO
68Phi-3 Medium1170ELO
69Yi-Lightning1165ELO
70Gemini 2.0 Flash Lite1160ELO
71Gemma 2 9B1160ELO
72Mixtral 8x7B (Groq)1160ELO
73Llama 3.2 11B Vision1160ELO
74Phi-3.5 Mini1160ELO
75Qwen 2.5 7B1160ELO
76Sonar1160ELO
77InternLM 2.5 20B1155ELO
78Mistral Small1150ELO
79Gemini 1.5 Flash 8B1150ELO
80GPT-4 1.5-nano1150ELO
81Phi-41140ELO
82Mistral Nemo 12B1140ELO
83Amazon Nova Micro1130ELO
84Command R7B1120ELO
85GPT-3.5 Turbo1120ELO
86Llama 3.1 8B (Groq)1120ELO
87Llama 3.1 8B1120ELO
88Command R1110ELO
89Mistral 7B1100ELO
90Mistral 7B (Together)1100ELO

What Reasoning ELO Tests

Human preference on reasoning-heavy tasks: math word problems, logic puzzles, structured analysis. A higher score means humans find the model's reasoning more sound and useful.

Score Range

1100–1450+ (average ~1210)

Compare models side-by-side

Full spec comparison — pricing, context window, and all benchmarks.

Compare Models →