Evaluation

Evals

Quick Answer

A framework and library for designing and running custom evaluations on language models.

OpenAI's evals library enables building custom evaluations. Evals can test any capability through scripts. Evals can be simple (exact match) or complex (using other models for grading). Custom evals are valuable for domain-specific evaluation. Evals enable evaluating proprietary behaviors. Building good evals is challenging. Evals library makes evaluation more practical.

Last verified: 2026-04-08

Compare models

See how different LLMs compare on benchmarks, pricing, and speed.

Browse all models →