Safety & Alignment

Alignment

Quick Answer

Training models to behave in ways consistent with human values and intentions.

Alignment means training models to behave as intended by humans. Misaligned models might refuse helpful requests or enable harmful ones. Alignment includes instruction-following, honesty, and refusal of harmful requests. RLHF and DPO are alignment techniques. Alignment is an active research area with no perfect solutions. Better alignment is a major focus for responsible AI. Alignment tradeoffs exist (helpfulness vs. safety). Alignment is fundamental to safe AI systems.

Last verified: 2026-04-08

Compare models

See how different LLMs compare on benchmarks, pricing, and speed.

Browse all models →