Safety & Alignment
Harmlessness
Quick Answer
A key alignment goal: ensuring models don't cause harm through their outputs.
Harmlessness is avoiding harmful outputs. Harmful outputs include: violence, illegal activity, hateful speech, deception. Harmlessness is learned through alignment training. Harmlessness tradeoffs exist (utility vs. harm). Context determines what's harmful. Harmlessness is subjective and culturally dependent. Harmlessness is a primary safety goal. Measuring harmlessness is challenging.
Last verified: 2026-04-08