safetyintermediate

Output Validation: Ensuring LLM Outputs Are Safe and Correct (2026)

Quick Answer

Output validation intercepts LLM responses before downstream processing and checks them against expected format, content constraints, and safety rules. For structured outputs: validate JSON schema. For natural language: check for prohibited content, confidence signals, and factual grounding. For agent actions: verify the action is in the allowed set with valid parameters. Invalid outputs should retry (with feedback) or fall back to a safe default.

When to Use

✓Any structured output extraction (JSON, XML) that flows into a database or downstream system
✓Agent tool calls where invalid parameters could cause data corruption or security issues
✓Customer-facing outputs that must comply with brand voice, content policies, and accuracy requirements
✓RAG outputs that must be grounded in retrieved context to prevent hallucination
✓Outputs that include numerical values, dates, or other machine-interpretable data

How It Works

1Schema validation: for JSON outputs, use Pydantic (Python) or Zod (TypeScript) to validate against a defined schema. If validation fails, retry with explicit correction feedback: 'Your previous output was invalid. Error: [specific error]. Please output valid JSON matching [schema].'
2Content validation: check outputs against prohibited content lists, required inclusions, length limits, and format requirements. Use regex for deterministic checks; use a fast LLM for nuanced content policy checks.
3Grounding validation: for RAG outputs, verify that factual claims in the output are supported by the retrieved context. Use an LLM-as-judge to check: 'Is every claim in this response verifiable from the provided context?'
4Action validation: for agent tool calls, validate that action_name is in the allowlist, parameters match the expected schema, and parameter values satisfy business rules (e.g., date in valid range, amount within limits).
5Retry with feedback: when validation fails, don't crash or return an error to the user. Include the validation error in a retry prompt: 'Your previous response contained [specific issue]. Please regenerate the response, fixing [issue].' Allow up to 2 retries before falling back.

Examples

Pydantic output validation with retry

from pydantic import BaseModel, validator
from anthropic import Anthropic
import json

client = Anthropic()

class ExtractedInvoice(BaseModel):
    invoice_number: str
    date: str
    total_amount: float
    vendor_name: str
    line_items: list[dict]
    
    @validator('total_amount')
    def amount_must_be_positive(cls, v):
        if v <= 0:
            raise ValueError('Amount must be positive')
        return v

def extract_with_validation(invoice_text: str, max_retries: int = 2) -> ExtractedInvoice:
    error_context = ''
    
    for attempt in range(max_retries + 1):
        response = client.messages.create(
            model='claude-3-5-haiku-20241022', max_tokens=500,
            messages=[{
                'role': 'user',
                'content': f'Extract invoice data as JSON.{error_context}\n\nInvoice:\n{invoice_text}'
            }]
        )
        
        try:
            data = json.loads(response.content[0].text)
            return ExtractedInvoice(**data)  # Pydantic validation
        except (json.JSONDecodeError, ValueError) as e:
            error_context = f' Previous attempt failed: {str(e)}. Fix this in your response.'
    
    raise ValueError(f'Failed to extract valid invoice after {max_retries} retries')

Output:Pydantic validates both schema (correct fields, correct types) and business logic (positive amount). On failure, the error is fed back to the model in a retry. 2 retries catches 95%+ of JSON formatting errors.

Grounding validation for RAG

def validate_grounding(query: str, context: str, response: str, threshold: float = 0.7) -> dict:
    validation_prompt = f'''
Query: {query}
Context provided: {context}
Response to validate: {response}

For each factual claim in the response:
1. Identify the claim
2. Find its source in the context (or note it's unsupported)
3. Rate support: supported/partially-supported/unsupported

Return JSON: {{"overall_grounding": 0-1, "unsupported_claims": ["..."]}}'''
    
    result = client.messages.create(
        model='claude-3-5-haiku-20241022', max_tokens=300,
        messages=[{'role': 'user', 'content': validation_prompt}]
    )
    
    data = json.loads(result.content[0].text)
    return {
        'valid': data['overall_grounding'] >= threshold,
        'grounding_score': data['overall_grounding'],
        'unsupported_claims': data['unsupported_claims']
    }

Output:Grounding validator catches hallucinations before the user sees them. At threshold=0.7, responses with over 30% unsupported claims are rejected. Add ~100ms latency with Haiku. For high-stakes RAG, this is essential.

Common Mistakes

✗Validating only JSON structure, not business logic — a response with valid JSON schema can still contain semantically wrong values (negative prices, invalid dates, impossible combinations). Include business rule validation alongside schema validation.
✗Infinite retry loops — don't retry indefinitely. Cap at 2-3 retries with degrading quality tolerance. After max retries, return a safe default response or escalate to human review rather than crashing.
✗Over-validation that creates false positives — excessive validation rejects valid outputs that don't match overly strict patterns. Calibrate validation rules against a sample of known-good outputs to measure false positive rate before deploying.
✗No logging of validation failures — validation failures are signals about model behavior and prompt quality. Log every failure with the original output and the validation error. Review weekly to identify systematic patterns.

FAQ

Should I validate all LLM outputs?+

Validate outputs that flow into: databases, API calls, user communications, or system actions. Simple conversational responses that are only read by humans need minimal validation (content safety check, length check). The validation overhead should be proportional to the risk of incorrect outputs.

What's the best way to get reliably structured JSON from LLMs?+

In order of reliability: (1) Structured outputs mode (OpenAI, Anthropic tool_use with forced tool call) — model guarantees JSON matching your schema. (2) Explicit JSON instructions with schema in the prompt. (3) Output parsing with retry. Structured outputs mode is the most reliable — use it whenever available for schema-critical extractions.

How do I validate code outputs?+

Run the code in a sandbox (Docker container, subprocess with timeout) and check: (1) No syntax errors, (2) Runs without exceptions on test cases, (3) Produces expected output. For security-sensitive contexts, use static analysis (bandit for Python) to check for security issues before running. Never run LLM-generated code in a production environment without sandbox validation.

How do I validate LLM outputs for safety/content moderation?+

Use specialized safety classifiers rather than general LLM-as-judge: OpenAI Moderation API, Llama Guard 3, or Anthropic's built-in safety features. These are specifically trained to detect harmful content and are more reliable and cheaper than a general purpose LLM judge for safety classification. Run safety validation in parallel with the main LLM call to minimize latency impact.

What should I return to the user when validation fails?+

Never expose validation error details to users — 'Your response failed JSON schema validation at field X' is technical and confusing. Return a friendly message: 'I had trouble generating a complete response. Let me try again.' If retries also fail, explain that you can't help with that specific request and offer an alternative.

guardrails prompt injection defense structured output error recovery ↗ invoice structured extraction ↗ content moderation ↗ qa testing agent

Output Validation: Ensuring LLM Outputs Are Safe and Correct (2026)

When to Use

How It Works

Examples

Common Mistakes

FAQ

Related