ELO Rating90 models ranked

Chatbot Arena ELO Leaderboard 2026

Chatbot Arena ELO ranks models based on human preference in blind side-by-side conversations. It is the most widely cited real-world quality signal because it captures what humans actually prefer — not just benchmark overfitting.

Quick Answer

The best model on Chatbot Arena ELO in 2026 is Claude Opus 4 by Anthropic, scoring 1503 ELO. Runner-up: Gemini 2.5 Pro (1430).

Provider

90 / 90 models

#	Model	Provider	Score	Percentile
🥇	Claude Opus 4	Anthropic	1503ELO	99th
🥈	Gemini 2.5 Pro	Google	1430ELO	98th
🥉	o3	OpenAI	1340ELO	97th
4	DeepSeek R1	DeepSeek	1310ELO	96th
5	o1	OpenAI	1310ELO	94th
6	Qwen 3 235B MoE	Alibaba	1310ELO	93th
7	Gemini Experimental 1206	Google	1300ELO	92th
8	GPT-4.5	OpenAI	1290ELO	91th
9	DeepSeek R1 (Groq)	Groq	1290ELO	90th
10	Llama 4 Maverick	Meta	1290ELO	89th
11	DeepSeek R1 (Together)	Together AI	1290ELO	88th
12	Grok 3	xAI	1285ELO	87th
13	Claude Sonnet 4	Anthropic	1280ELO	86th
14	DeepSeek V3	DeepSeek	1280ELO	84th
15	Gemini 2.0 Flash Thinking	Google	1280ELO	83th
16	o3-mini	OpenAI	1280ELO	82th
17	Claude 3.5 Sonnet	Anthropic	1270ELO	81th
18	Gemini 2.5 Flash	Google	1270ELO	80th
19	o1-mini	OpenAI	1270ELO	79th
20	ChatGPT-4o Latest	OpenAI	1265ELO	78th
21	Gemini 2.0 Flash	Google	1260ELO	77th
22	GPT-4o	OpenAI	1260ELO	76th
23	o4-mini	OpenAI	1260ELO	74th
24	Qwen 2.5 Max	Alibaba	1260ELO	73th
25	QwQ 32B	Alibaba	1260ELO	72th
26	GPT-4o (Aug 2024)	OpenAI	1255ELO	71th
27	DeepSeek R1 Distill Llama 70B	DeepSeek	1250ELO	70th
28	Llama 4 Scout	Meta	1250ELO	69th
29	Mistral Large	Mistral	1245ELO	68th
30	Command A	Cohere	1240ELO	67th
31	DeepSeek R1 Distill Qwen 32B	DeepSeek	1240ELO	66th
32	Llama 3.1 405B (Fireworks)	Fireworks AI	1240ELO	64th
33	GPT-4 Turbo	OpenAI	1240ELO	63th
34	Grok 2	xAI	1240ELO	62th
35	Llama 3.1 405B	Meta	1240ELO	61th
36	Sonar Reasoning	Perplexity	1240ELO	60th
37	Llama 3.1 405B (Together)	Together AI	1240ELO	59th
38	Gemini 1.5 Pro	Google	1230ELO	58th
39	Grok 2 Vision	xAI	1230ELO	57th
40	Pixtral Large	Mistral AI	1230ELO	56th
41	Qwen 2.5 72B	Alibaba	1230ELO	54th
42	Qwen 2.5 72B (Together)	Together AI	1230ELO	53th
43	Amazon Nova Pro	Amazon	1220ELO	52th
44	Claude 3.5 Haiku	Anthropic	1220ELO	51th
45	Claude Haiku 4	Anthropic	1220ELO	50th
46	Llama 3.3 70B (Fireworks)	Fireworks AI	1220ELO	49th
47	GPT-4o Mini	OpenAI	1220ELO	48th
48	Llama 3.3 70B (Groq)	Groq	1220ELO	47th
49	Llama 3.3 70B	Meta	1220ELO	46th
50	Mistral Medium 3	Mistral AI	1220ELO	44th
51	Llama 3.3 70B (Together)	Together AI	1220ELO	43th
52	Llama 3.2 90B Vision	Meta	1210ELO	42th
53	Command R+	Cohere	1200ELO	41th
54	DeepSeek V2.5	DeepSeek	1200ELO	40th
55	Mixtral 8x22B (Fireworks)	Fireworks AI	1200ELO	39th
56	Gemini 2.0 Flash Lite	Google	1200ELO	38th
57	GPT-4 1	OpenAI	1200ELO	37th
58	Sonar Pro	Perplexity	1200ELO	36th
59	WizardLM-2 8x22B	Microsoft	1200ELO	34th
60	Llama 3.1 70B	Meta	1195ELO	33th
61	Phi-3.5 MoE	Microsoft	1195ELO	32th
62	Gemini 1.5 Flash	Google	1190ELO	31th
63	Gemma 2 27B	Google	1190ELO	30th
64	Mistral Small	Mistral	1185ELO	29th
65	Yi-Large	01.AI	1185ELO	28th
66	GPT-4 1.5-mini	OpenAI	1180ELO	27th
67	Grok 3-mini	xAI	1175ELO	26th
68	Amazon Nova Lite	Amazon	1170ELO	24th
69	Gemma 2 9B (Groq)	Groq	1170ELO	23th
70	Phi-3 Medium	Microsoft	1170ELO	22th
71	Yi-Lightning	01.AI	1165ELO	21th
72	Gemma 2 9B	Google	1160ELO	20th
73	Mixtral 8x7B (Groq)	Groq	1160ELO	19th
74	Llama 3.2 11B Vision	Meta	1160ELO	18th
75	Phi-3.5 Mini	Microsoft	1160ELO	17th
76	Qwen 2.5 7B	Alibaba	1160ELO	16th
77	Sonar	Perplexity	1160ELO	14th
78	InternLM 2.5 20B	Shanghai AI Lab	1155ELO	13th
79	Gemini 1.5 Flash 8B	Google	1150ELO	12th
80	GPT-4 1.5-nano	OpenAI	1150ELO	11th
81	Phi-4	Microsoft	1150ELO	10th
82	Command R	Cohere	1140ELO	9th
83	Mistral Nemo 12B	Mistral AI	1140ELO	8th
84	Amazon Nova Micro	Amazon	1130ELO	7th
85	Command R7B	Cohere	1120ELO	6th
86	GPT-3.5 Turbo	OpenAI	1120ELO	4th
87	Llama 3.1 8B (Groq)	Groq	1120ELO	3th
88	Llama 3.1 8B	Meta	1120ELO	2th
89	Mistral 7B	Mistral AI	1100ELO	1th
90	Mistral 7B (Together)	Together AI	1100ELO	0th

What Chatbot Arena ELO Tests

Human preference across open-ended conversations. Models compete head-to-head; wins and losses update the ELO score. Higher ELO = humans prefer this model's output.

Score Range

1100–1400+ (average ~1200)

Source

LMSYS Chatbot Arena ↗

Other Benchmarks

Coding ELO Reasoning ELO HumanEval MMLU MATH GPQA

Compare models side-by-side

Full spec comparison — pricing, context window, and all benchmarks.

Compare Models →