Evaluation
BLEU Score
Quick Answer
A metric measuring similarity between machine translation output and reference translations.
BLEU (Bilingual Evaluation Understudy) measures translation quality by comparing n-gram overlap. Higher BLEU means more similar to reference. BLEU is simple and automated but imperfect. BLEU correlates reasonably with human judgment. BLEU has well-known limitations (penalizes paraphrase). For translation, BLEU is standard. BLEU is less relevant for modern LLMs.
Last verified: 2026-04-08