Training

RLHF

Quick Answer

Abbreviation for Reinforcement Learning from Human Feedback.

RLHF is the abbreviation for Reinforcement Learning from Human Feedback. It's a training procedure that optimizes models based on human preference judgments. RLHF has been crucial in making models useful and aligned. The process involves collecting human judgments, training a reward model, and using RL to optimize the policy. RLHF is expensive but highly effective. Recent work explores more efficient alternatives while maintaining quality gains. RLHF remains a standard part of modern model training.

Last verified: 2026-04-08

Compare models

See how different LLMs compare on benchmarks, pricing, and speed.

Browse all models →

← All glossary terms