Evaluation
Calibration
Quick Answer
How well a model's confidence scores match actual correctness probability.
Calibration measures whether confidence aligns with actual correctness. A perfectly calibrated model that says 80% confident is correct 80% of the time. Many models are overconfident (say 90% confident but correct only 70%). Poor calibration is problematic for risk-sensitive applications. Temperature affects calibration. Calibration can be improved through fine-tuning. Measuring calibration requires confidence-annotated data.
Last verified: 2026-04-08