GPQA Diamond Leaderboard 2026
Graduate-Level Google-Proof Q&A (GPQA) Diamond is a set of 198 expert-level science questions written by domain specialists. The 'Google-proof' design means the answers cannot be found by simple web search — they require genuine understanding.
Quick Answer
The best model on GPQA Diamond in 2026 is o3 by OpenAI, scoring 94%. Runner-up: o1 (92%).
What GPQA Diamond Tests
Expert-level multiple-choice questions in biology, chemistry, and physics written by PhD researchers. Questions are intentionally hard to Google. Human non-expert accuracy is ~22%; PhD expert accuracy is ~65%. Scores above 50% indicate strong scientific reasoning.
Score Range
0–100% (PhD expert ~65%, non-expert ~22%)
Source
Rein et al. — GPQA ↗Other Benchmarks
Compare models side-by-side
Full spec comparison — pricing, context window, and all benchmarks.