Accuracy %90 models ranked
MATH Leaderboard 2026
The MATH benchmark tests mathematical problem-solving across 5 difficulty levels. Problems range from pre-algebra to calculus and competition mathematics. It remains one of the hardest benchmarks — early GPT-4 scored only 42%.
Quick Answer
The best model on MATH in 2026 is Qwen 3 235B MoE by Alibaba, scoring 168%. Runner-up: Gemini Experimental 1206 (160%).
90 / 90 models
What MATH Tests
Solving math problems requiring multi-step work: algebra, geometry, number theory, counting, probability, and calculus. Models must show their reasoning and produce exact answers. Harder than MMLU math questions.
Score Range
0–100% (human expert ~90%)
Other Benchmarks
Compare models side-by-side
Full spec comparison — pricing, context window, and all benchmarks.