Evaluation
Arena Elo
Quick Answer
A ranking system comparing models based on human preference judgments in pairwise comparisons.
Arena Elo (from Chatbot Arena) ranks models based on human preference votes. Users compare models pairwise, and Elo ratings are updated. Arena Elo reflects real user preferences better than fixed benchmarks. Elo ratings change over time as models are updated. Arena Elo is dynamic but less reproducible than fixed benchmarks. It captures overall quality better than single benchmarks. Arena Elo has become standard for tracking model progress.
Last verified: 2026-04-08