ELO Rating78 models ranked

Coding ELO Leaderboard 2026

Coding ELO is a specialised Chatbot Arena leaderboard restricted to coding tasks. It measures human preference when comparing model-generated code, debugging explanations, and programming help.

Quick Answer

The best model on Coding ELO in 2026 is Claude Opus 4 by Anthropic, scoring 1503 ELO. Runner-up: Gemini 2.5 Pro (1430).

Provider

78 / 78 models

#	Model	Provider	Score	Percentile
🥇	Claude Opus 4	Anthropic	1503ELO	99th
🥈	Gemini 2.5 Pro	Google	1430ELO	97th
🥉	DeepSeek R1	DeepSeek	1330ELO	96th
4	o3	OpenAI	1320ELO	95th
5	o1	OpenAI	1310ELO	94th
6	Claude Sonnet 4	Anthropic	1305ELO	92th
7	DeepSeek V3	DeepSeek	1300ELO	91th
8	Claude 3.5 Sonnet	Anthropic	1300ELO	90th
9	GPT-4.5	OpenAI	1295ELO	88th
10	Qwen 3 235B MoE	Alibaba	1290ELO	87th
11	Qwen 2.5 Coder 32B	Alibaba	1290ELO	86th
12	o3-mini	OpenAI	1285ELO	85th
13	Gemini Experimental 1206	Google	1280ELO	83th
14	Llama 4 Maverick	Meta	1280ELO	82th
15	Grok 3	xAI	1280ELO	81th
16	o1-mini	OpenAI	1280ELO	79th
17	DeepSeek R1 (Groq)	Groq	1270ELO	78th
18	DeepSeek R1 (Together)	Together AI	1270ELO	77th
19	Gemini 2.0 Flash Thinking	Google	1270ELO	76th
20	ChatGPT-4o Latest	OpenAI	1270ELO	74th
21	o4-mini	OpenAI	1270ELO	73th
22	GPT-4o	OpenAI	1265ELO	72th
23	GPT-4o (Aug 2024)	OpenAI	1265ELO	71th
24	Gemini 2.5 Flash	Google	1260ELO	69th
25	Qwen 2.5 Max	Alibaba	1250ELO	68th
26	QwQ 32B	Alibaba	1250ELO	67th
27	Claude 3.5 Haiku	Anthropic	1250ELO	65th
28	Codestral 22B	Mistral AI	1250ELO	64th
29	GPT-4 Turbo	OpenAI	1245ELO	63th
30	Gemini 2.0 Flash	Google	1240ELO	62th
31	DeepSeek R1 Distill Llama 70B	DeepSeek	1240ELO	60th
32	Mistral Large	Mistral	1240ELO	59th
33	DeepSeek R1 Distill Qwen 32B	DeepSeek	1235ELO	58th
34	Sonar Reasoning	Perplexity	1235ELO	56th
35	Llama 4 Scout	Meta	1230ELO	55th
36	Command A	Cohere	1230ELO	54th
37	Grok 2	xAI	1225ELO	53th
38	DeepSeek V2.5	DeepSeek	1220ELO	51th
39	Mixtral 8x22B (Fireworks)	Fireworks AI	1220ELO	50th
40	WizardLM-2 8x22B	Microsoft	1220ELO	49th
41	Qwen 2.5 72B	Alibaba	1210ELO	47th
42	Qwen 2.5 72B (Together)	Together AI	1210ELO	46th
43	Amazon Nova Pro	Amazon	1210ELO	45th
44	GPT-4 1	OpenAI	1210ELO	44th
45	Llama 3.1 405B (Fireworks)	Fireworks AI	1200ELO	42th
46	Llama 3.1 405B	Meta	1200ELO	41th
47	Llama 3.1 405B (Together)	Together AI	1200ELO	40th
48	GPT-4o Mini	OpenAI	1200ELO	38th
49	GPT-4 1.5-mini	OpenAI	1200ELO	37th
50	Claude Haiku 4	Anthropic	1195ELO	36th
51	Sonar Pro	Perplexity	1195ELO	35th
52	Gemma 2 27B	Google	1195ELO	33th
53	Yi-Large	01.AI	1195ELO	32th
54	Phi-3.5 MoE	Microsoft	1190ELO	31th
55	Grok 3-mini	xAI	1190ELO	29th
56	Llama 3.3 70B (Fireworks)	Fireworks AI	1180ELO	28th
57	Llama 3.3 70B (Groq)	Groq	1180ELO	27th
58	Llama 3.3 70B	Meta	1180ELO	26th
59	Llama 3.3 70B (Together)	Together AI	1180ELO	24th
60	Phi-3 Medium	Microsoft	1175ELO	23th
61	Gemini 2.0 Flash Lite	Google	1170ELO	22th
62	Yi-Lightning	01.AI	1170ELO	21th
63	Sonar	Perplexity	1170ELO	19th
64	Phi-3.5 Mini	Microsoft	1165ELO	18th
65	Command R+	Cohere	1160ELO	17th
66	Mistral Small	Mistral	1160ELO	15th
67	Amazon Nova Lite	Amazon	1160ELO	14th
68	Gemma 2 9B (Groq)	Groq	1160ELO	13th
69	GPT-4 1.5-nano	OpenAI	1160ELO	12th
70	Gemma 2 9B	Google	1155ELO	10th
71	Mixtral 8x7B (Groq)	Groq	1150ELO	9th
72	InternLM 2.5 20B	Shanghai AI Lab	1150ELO	8th
73	Phi-4	Microsoft	1130ELO	6th
74	Command R	Cohere	1100ELO	5th
75	Amazon Nova Micro	Amazon	1100ELO	4th
76	Command R7B	Cohere	1100ELO	3th
77	GPT-3.5 Turbo	OpenAI	1100ELO	1th
78	Mistral 7B (Together)	Together AI	1090ELO	0th

What Coding ELO Tests

Human preference on coding-specific tasks: writing functions, explaining bugs, reviewing pull requests. Derived from the same Arena methodology applied only to code conversations.

Score Range

1100–1450+ (average ~1220)

Source

LMSYS Chatbot Arena — Coding ↗

Other Benchmarks

Arena ELO Reasoning ELO HumanEval MMLU MATH GPQA

Compare models side-by-side

Full spec comparison — pricing, context window, and all benchmarks.

Compare Models →