Training
RLHF
Quick Answer
Abbreviation for Reinforcement Learning from Human Feedback.
RLHF is the abbreviation for Reinforcement Learning from Human Feedback. It's a training procedure that optimizes models based on human preference judgments. RLHF has been crucial in making models useful and aligned. The process involves collecting human judgments, training a reward model, and using RL to optimize the policy. RLHF is expensive but highly effective. Recent work explores more efficient alternatives while maintaining quality gains. RLHF remains a standard part of modern model training.
Last verified: 2026-04-08