ELO Rating78 models ranked

Coding ELO Leaderboard 2026

Coding ELO is a specialised Chatbot Arena leaderboard restricted to coding tasks. It measures human preference when comparing model-generated code, debugging explanations, and programming help.

Quick Answer

The best model on Coding ELO in 2026 is Claude Opus 4 by Anthropic, scoring 1503 ELO. Runner-up: Gemini 2.5 Pro (1430).

78 / 78 models
#ModelScore
🥇Claude Opus 41503ELO
🥈Gemini 2.5 Pro1430ELO
🥉DeepSeek R11330ELO
4o31320ELO
5o11310ELO
6Claude Sonnet 41305ELO
7DeepSeek V31300ELO
8Claude 3.5 Sonnet1300ELO
9GPT-4.51295ELO
10Qwen 3 235B MoE1290ELO
11Qwen 2.5 Coder 32B1290ELO
12o3-mini1285ELO
13Gemini Experimental 12061280ELO
14Llama 4 Maverick1280ELO
15Grok 31280ELO
16o1-mini1280ELO
17DeepSeek R1 (Groq)1270ELO
18DeepSeek R1 (Together)1270ELO
19Gemini 2.0 Flash Thinking1270ELO
20ChatGPT-4o Latest1270ELO
21o4-mini1270ELO
22GPT-4o1265ELO
23GPT-4o (Aug 2024)1265ELO
24Gemini 2.5 Flash1260ELO
25Qwen 2.5 Max1250ELO
26QwQ 32B1250ELO
27Claude 3.5 Haiku1250ELO
28Codestral 22B1250ELO
29GPT-4 Turbo1245ELO
30Gemini 2.0 Flash1240ELO
31DeepSeek R1 Distill Llama 70B1240ELO
32Mistral Large1240ELO
33DeepSeek R1 Distill Qwen 32B1235ELO
34Sonar Reasoning1235ELO
35Llama 4 Scout1230ELO
36Command A1230ELO
37Grok 21225ELO
38DeepSeek V2.51220ELO
39Mixtral 8x22B (Fireworks)1220ELO
40WizardLM-2 8x22B1220ELO
41Qwen 2.5 72B1210ELO
42Qwen 2.5 72B (Together)1210ELO
43Amazon Nova Pro1210ELO
44GPT-4 11210ELO
45Llama 3.1 405B (Fireworks)1200ELO
46Llama 3.1 405B1200ELO
47Llama 3.1 405B (Together)1200ELO
48GPT-4o Mini1200ELO
49GPT-4 1.5-mini1200ELO
50Claude Haiku 41195ELO
51Sonar Pro1195ELO
52Gemma 2 27B1195ELO
53Yi-Large1195ELO
54Phi-3.5 MoE1190ELO
55Grok 3-mini1190ELO
56Llama 3.3 70B (Fireworks)1180ELO
57Llama 3.3 70B (Groq)1180ELO
58Llama 3.3 70B1180ELO
59Llama 3.3 70B (Together)1180ELO
60Phi-3 Medium1175ELO
61Gemini 2.0 Flash Lite1170ELO
62Yi-Lightning1170ELO
63Sonar1170ELO
64Phi-3.5 Mini1165ELO
65Command R+1160ELO
66Mistral Small1160ELO
67Amazon Nova Lite1160ELO
68Gemma 2 9B (Groq)1160ELO
69GPT-4 1.5-nano1160ELO
70Gemma 2 9B1155ELO
71Mixtral 8x7B (Groq)1150ELO
72InternLM 2.5 20B1150ELO
73Phi-41130ELO
74Command R1100ELO
75Amazon Nova Micro1100ELO
76Command R7B1100ELO
77GPT-3.5 Turbo1100ELO
78Mistral 7B (Together)1090ELO

What Coding ELO Tests

Human preference on coding-specific tasks: writing functions, explaining bugs, reviewing pull requests. Derived from the same Arena methodology applied only to code conversations.

Score Range

1100–1450+ (average ~1220)

Compare models side-by-side

Full spec comparison — pricing, context window, and all benchmarks.

Compare Models →