ELO Rating90 models ranked

Chatbot Arena ELO Leaderboard 2026

Chatbot Arena ELO ranks models based on human preference in blind side-by-side conversations. It is the most widely cited real-world quality signal because it captures what humans actually prefer — not just benchmark overfitting.

Quick Answer

The best model on Chatbot Arena ELO in 2026 is Claude Opus 4 by Anthropic, scoring 1503 ELO. Runner-up: Gemini 2.5 Pro (1430).

90 / 90 models
#ModelScore
🥇Claude Opus 41503ELO
🥈Gemini 2.5 Pro1430ELO
🥉o31340ELO
4DeepSeek R11310ELO
5o11310ELO
6Qwen 3 235B MoE1310ELO
7Gemini Experimental 12061300ELO
8GPT-4.51290ELO
9DeepSeek R1 (Groq)1290ELO
10Llama 4 Maverick1290ELO
11DeepSeek R1 (Together)1290ELO
12Grok 31285ELO
13Claude Sonnet 41280ELO
14DeepSeek V31280ELO
15Gemini 2.0 Flash Thinking1280ELO
16o3-mini1280ELO
17Claude 3.5 Sonnet1270ELO
18Gemini 2.5 Flash1270ELO
19o1-mini1270ELO
20ChatGPT-4o Latest1265ELO
21Gemini 2.0 Flash1260ELO
22GPT-4o1260ELO
23o4-mini1260ELO
24Qwen 2.5 Max1260ELO
25QwQ 32B1260ELO
26GPT-4o (Aug 2024)1255ELO
27DeepSeek R1 Distill Llama 70B1250ELO
28Llama 4 Scout1250ELO
29Mistral Large1245ELO
30Command A1240ELO
31DeepSeek R1 Distill Qwen 32B1240ELO
32Llama 3.1 405B (Fireworks)1240ELO
33GPT-4 Turbo1240ELO
34Grok 21240ELO
35Llama 3.1 405B1240ELO
36Sonar Reasoning1240ELO
37Llama 3.1 405B (Together)1240ELO
38Gemini 1.5 Pro1230ELO
39Grok 2 Vision1230ELO
40Pixtral Large1230ELO
41Qwen 2.5 72B1230ELO
42Qwen 2.5 72B (Together)1230ELO
43Amazon Nova Pro1220ELO
44Claude 3.5 Haiku1220ELO
45Claude Haiku 41220ELO
46Llama 3.3 70B (Fireworks)1220ELO
47GPT-4o Mini1220ELO
48Llama 3.3 70B (Groq)1220ELO
49Llama 3.3 70B1220ELO
50Mistral Medium 31220ELO
51Llama 3.3 70B (Together)1220ELO
52Llama 3.2 90B Vision1210ELO
53Command R+1200ELO
54DeepSeek V2.51200ELO
55Mixtral 8x22B (Fireworks)1200ELO
56Gemini 2.0 Flash Lite1200ELO
57GPT-4 11200ELO
58Sonar Pro1200ELO
59WizardLM-2 8x22B1200ELO
60Llama 3.1 70B1195ELO
61Phi-3.5 MoE1195ELO
62Gemini 1.5 Flash1190ELO
63Gemma 2 27B1190ELO
64Mistral Small1185ELO
65Yi-Large1185ELO
66GPT-4 1.5-mini1180ELO
67Grok 3-mini1175ELO
68Amazon Nova Lite1170ELO
69Gemma 2 9B (Groq)1170ELO
70Phi-3 Medium1170ELO
71Yi-Lightning1165ELO
72Gemma 2 9B1160ELO
73Mixtral 8x7B (Groq)1160ELO
74Llama 3.2 11B Vision1160ELO
75Phi-3.5 Mini1160ELO
76Qwen 2.5 7B1160ELO
77Sonar1160ELO
78InternLM 2.5 20B1155ELO
79Gemini 1.5 Flash 8B1150ELO
80GPT-4 1.5-nano1150ELO
81Phi-41150ELO
82Command R1140ELO
83Mistral Nemo 12B1140ELO
84Amazon Nova Micro1130ELO
85Command R7B1120ELO
86GPT-3.5 Turbo1120ELO
87Llama 3.1 8B (Groq)1120ELO
88Llama 3.1 8B1120ELO
89Mistral 7B1100ELO
90Mistral 7B (Together)1100ELO

What Chatbot Arena ELO Tests

Human preference across open-ended conversations. Models compete head-to-head; wins and losses update the ELO score. Higher ELO = humans prefer this model's output.

Score Range

1100–1400+ (average ~1200)

Compare models side-by-side

Full spec comparison — pricing, context window, and all benchmarks.

Compare Models →