Claude Extended Thinking: Complete Guide
Claude's Extended Thinking feature lets the model "think" before responding — generating internal reasoning tokens that aren't shown in the final answer but meaningfully improve quality on hard problems. This guide covers what it actually does, when it's worth the extra cost, and how to use it.
What Is Extended Thinking?
When Extended Thinking is enabled, Claude generates a chain of reasoning before producing its final response. You can see this reasoning in the thinking blocks of the response. The model uses these tokens to:
- Break down complex problems into steps
- Consider multiple approaches before committing
- Check its own reasoning for errors
- Work through mathematical derivations carefully
Thinking tokens are not free — they cost the same as output tokens. But they don't count against your output token budget (they're separate).
Pricing
| Token Type | Price per 1M |
| Input tokens | $3.00 |
| Output tokens | $15.00 |
| Thinking tokens | $3.75 |
Thinking tokens are priced between input and output tokens. For a problem requiring 10,000 thinking tokens + 500 output tokens:
- Without extended thinking: 500 tokens × $15/1M = $0.0075
- With extended thinking: 10,000 × $3.75/1M + 500 × $15/1M = $0.0375 + $0.0075 = $0.045
A 6x cost increase for that request. Whether it's worth it depends entirely on the task.
How to Enable Extended Thinking
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=16000,
thinking={
"type": "enabled",
"budget_tokens": 10000 # Max thinking tokens allowed
},
messages=[{
"role": "user",
"content": "Prove that there are infinitely many prime numbers."
}]
)
# The response has thinking blocks + text blocks
for block in response.content:
if block.type == "thinking":
print("=== THINKING ===")
print(block.thinking[:500]) # Show first 500 chars
print()
elif block.type == "text":
print("=== RESPONSE ===")
print(block.text)
# Check thinking tokens used
print(f"\nInput tokens: {response.usage.input_tokens}")
print(f"Output tokens: {response.usage.output_tokens}")
# Note: thinking tokens are reported differently per API version
Setting Budget Tokens
The budget_tokens parameter controls how much the model can think. Guidelines:
| Task Complexity | Suggested Budget |
| Simple logic | 1,000-2,000 |
| Medium coding | 3,000-5,000 |
| Hard math/proofs | 8,000-15,000 |
| Very complex problems | 15,000-32,000 |
Claude won't always use the full budget — it uses what it needs.
When Extended Thinking Helps
1. Mathematics and Formal Proofs
This is where extended thinking shines most. Problems requiring:
- Multi-step algebraic manipulation
- Mathematical proofs by induction or contradiction
- Calculus problems with multiple integration steps
- Combinatorics and probability problems
Benchmark improvement (MATH-500):
- Claude Sonnet 4 without thinking: ~62%
- Claude Sonnet 4 with thinking (budget 10K): ~71%
- Claude Sonnet 4 with thinking (budget 32K): ~76%
A 14-point improvement on hard math problems is significant.
2. Complex Multi-Step Coding
Problems that benefit:
- Algorithm design with non-obvious approaches
- Debugging complex race conditions or state management issues
- Designing data structures for specific performance characteristics
- Problems with subtle edge cases (off-by-one, overflow, etc.)
For implementing a simple CRUD endpoint? Extended thinking adds cost with no benefit. For implementing a lock-free concurrent queue? It helps.
3. Logic Puzzles and Reasoning
Extended thinking significantly improves performance on:
- Constraint satisfaction problems
- Multi-step logical deductions
- Problems requiring eliminating possibilities systematically
Example where it helps:
Alice, Bob, Carol, and Dan each own exactly one pet (cat, dog, fish, rabbit).
- Alice doesn't own a dog or fish.
- Bob's pet is not a cat.
- The person with a rabbit lives next to Carol.
- Dan has a fish.
Who owns the rabbit?
Without thinking, Claude may get this wrong or give a poorly explained answer. With thinking, it systematically works through the constraints.
4. Complex Code Architecture Decisions
When asked to design a system architecture, extended thinking helps Claude consider:
- Trade-offs between approaches
- Edge cases in the design
- Potential failure modes
- Whether the simpler or more complex approach is actually warranted
When Extended Thinking Is Wasteful
Don't enable extended thinking for these use cases:
Simple Q&A
# Wasteful — extended thinking adds nothing
response = client.messages.create(
model="claude-sonnet-4-5",
thinking={"type": "enabled", "budget_tokens": 5000}, # Unnecessary
messages=[{"role": "user", "content": "What is the capital of France?"}]
)
Creative Writing
Extended thinking doesn't improve creative writing quality. Creativity is not a reasoning task — it's a generation task. Use the tokens for actual output.Summarization
Summarizing a document doesn't require deep reasoning. Save your budget.Translation
Language translation is a pattern-matching task, not a reasoning task.Conversation and Chat
For real-time chat applications, extended thinking adds latency (seconds to minutes) for no meaningful improvement on typical conversational turns.Extraction Tasks
Extracting structured data from text (names, dates, addresses) doesn't benefit from extended thinking.Cost Control in Practice
import anthropic
from dataclasses import dataclass
@dataclass
class TaskClassifier:
"""Decide whether to use extended thinking based on task type."""
COMPLEX_KEYWORDS = [
"prove", "proof", "derive", "calculate", "solve",
"algorithm", "optimize", "complexity", "debug",
"why does", "what's wrong", "race condition"
]
def needs_thinking(self, prompt: str) -> tuple[bool, int]:
"""Returns (use_thinking, budget_tokens)"""
prompt_lower = prompt.lower()
# Check for math indicators
if any(k in prompt_lower for k in ["prove", "proof", "derive", "integral", "sum of"]):
return True, 15000
# Check for complex coding
if any(k in prompt_lower for k in ["race condition", "concurrent", "deadlock", "algorithm"]):
return True, 8000
# Check for logic puzzles
if "if" in prompt_lower and "then" in prompt_lower and len(prompt) > 200:
return True, 5000
return False, 0
client = anthropic.Anthropic()
classifier = TaskClassifier()
def smart_complete(prompt: str) -> str:
use_thinking, budget = classifier.needs_thinking(prompt)
kwargs = {
"model": "claude-sonnet-4-5",
"max_tokens": 4096,
"messages": [{"role": "user", "content": prompt}]
}
if use_thinking:
kwargs["thinking"] = {"type": "enabled", "budget_tokens": budget}
print(f"Using extended thinking (budget: {budget} tokens)")
response = client.messages.create(**kwargs)
# Return only the text block
return next(b.text for b in response.content if hasattr(b, 'text'))
Streaming with Extended Thinking
with client.messages.stream(
model="claude-sonnet-4-5",
max_tokens=16000,
thinking={"type": "enabled", "budget_tokens": 8000},
messages=[{"role": "user", "content": "Implement a binary search tree with AVL balancing in Python"}]
) as stream:
for event in stream:
if hasattr(event, 'type'):
if event.type == 'content_block_start':
if hasattr(event.content_block, 'type'):
if event.content_block.type == 'thinking':
print("[Thinking...]")
elif event.content_block.type == 'text':
print("\n[Response]")
elif event.type == 'content_block_delta':
if hasattr(event.delta, 'text'):
print(event.delta.text, end='', flush=True)
Impact on Latency
Extended thinking adds meaningful latency:
| Budget Tokens | Added Latency (approx) |
| 1,000 | +2-5 seconds |
| 5,000 | +10-20 seconds |
| 10,000 | +20-40 seconds |
| 32,000 | +60-120 seconds |
For interactive applications, only use extended thinking when the quality improvement justifies the wait. For batch processing, it's usually worth enabling.
the key point
Extended thinking is a targeted tool, not a universal upgrade.
Enable it for: Hard math, formal proofs, complex algorithm design, debugging tricky multi-step logic, problems where being wrong has a high cost.
Skip it for: Chat, creative tasks, summarization, translation, simple Q&A, anything where latency matters and the task doesn't require deep reasoning.
A reasonable default: disable extended thinking globally, enable it in specific call sites for known hard problems. The 6x cost increase is only justified when the quality improvement is measurable and matters.
Methodology
All performance figures in this article are sourced from publicly available benchmarks (MMLU, HumanEval, LMSYS Chatbot Arena ELO), provider pricing pages verified on 2026-04-16, and independent speed tests conducted via provider APIs. Pricing is listed as input/output per million tokens unless noted otherwise. Rankings reflect the date of publication and will change as models are updated.