Training
Constitutional AI
Quick Answer
A training approach using a set of principles to guide model behavior without extensive human feedback.
Constitutional AI trains models to follow a set of explicit principles (constitution) rather than relying solely on human preference judgments. The model is given principles ('Prioritize safety', 'Be helpful') and learns to follow them. CAI reduces dependence on large-scale human feedback. It's more interpretable than black-box reward models. CAI can be combined with standard RLHF. Anthropic has pioneered this approach. Constitutional AI is one approach to scalable alignment.
Last verified: 2026-04-08