promptingintermediate

Structured Output from LLMs (2026)

Quick Answer

Getting reliable structured output (JSON, XML, CSV) from LLMs requires either constrained decoding (enforced at the token level by the API) or careful schema prompting combined with output validation. OpenAI's response_format with json_schema and Anthropic's tool use feature both provide near-100% schema compliance. Without these, use few-shot examples plus a validation+retry loop.

When to Use

  • Extracting structured data from unstructured text (invoices, contracts, emails)
  • Building pipelines where downstream code parses LLM output programmatically
  • Generating configuration, API payloads, or database records from natural language
  • Classification tasks where you need a consistent set of labels as output
  • Any production workflow where invalid output would cause errors or require manual correction

How It Works

  1. 1Constrained decoding: Modern APIs (OpenAI structured outputs, Outlines library, Guidance) constrain the token sampling distribution at inference time so only tokens that produce valid JSON matching your schema are possible. This gives ~100% schema compliance.
  2. 2Tool use / function calling: Anthropic Claude and OpenAI GPT-4o support defining a JSON schema as a 'tool'. The model fills in the schema rather than generating free text. This is the most reliable method without constrained decoding.
  3. 3Schema prompting: Include your JSON schema in the prompt and use few-shot examples. Reliability is ~85–95% for simple schemas on frontier models. Add a validation+retry loop for the remaining cases.
  4. 4Output validation: Always validate the output against your schema before using it. If validation fails, send the invalid output + the error message back to the model with 'Fix the JSON to match the schema:' — this self-correction step works ~90% of the time.
  5. 5For complex nested schemas, break extraction into multiple simpler calls rather than attempting a single 50-field extraction — error rates compound with schema complexity.

Examples

OpenAI structured outputs with JSON schema
Anthropic tool use for structured extraction

Common Mistakes

  • Asking for JSON without a schema: 'Output as JSON' produces wildly varying structures across runs. Always provide the exact schema, ideally as a TypeScript interface or JSON Schema object in the prompt.
  • Not validating output in production: Even with constrained decoding, edge cases can produce unexpected values. Always validate and handle parse errors gracefully — never assume the output is valid.
  • Using json_mode without a schema: OpenAI's json_mode guarantees valid JSON syntax but not adherence to your schema. Use response_format with a schema or tool use instead.
  • Trying to extract a 30-field schema in one call: Complex nested schemas dramatically increase error rates. Split into multiple targeted extractions and merge results in application code.

FAQ

What's the most reliable way to get JSON from Claude?+

Use tool use with tool_choice forced to your extraction tool. This gives ~99% schema compliance. Alternatively, use the output_format XML tag with a JSON schema example in the system prompt and add validation+retry logic for the ~5% failure rate.

Should I use Pydantic or JSON Schema for defining my output structure?+

Pydantic is more ergonomic for Python — OpenAI's SDK accepts Pydantic models directly via the `.parse()` method and generates the JSON Schema automatically. For Anthropic's tool use, you write JSON Schema directly. Use whichever matches your stack.

How does the Outlines library help?+

Outlines is a Python library for constrained generation when running models locally (via vLLM, llama.cpp, etc.). It constrains token sampling to only allow tokens that produce valid output matching your regex or JSON schema — achieving near-100% compliance even without API-level support.

What should I do when the model outputs invalid JSON?+

Implement a retry loop: (1) validate the output, (2) if invalid, send back 'Your output was invalid JSON. Here is the error: {error}. Please fix it and return only valid JSON.' This self-correction works ~90% of the time. After 2 failed retries, fall back to a manual parsing strategy or flag for human review.

Does structured output work for streaming responses?+

Yes. OpenAI's structured output API supports streaming — you receive partial JSON that becomes valid only at completion. In streaming mode, buffer the full response before parsing. Anthropic's tool use also supports streaming input_json_delta events.

Related