OpenRouter Complete Guide 2026: Access 100+ LLMs With One API
OpenRouter lets you access 100+ language models — OpenAI, Anthropic, Google, Meta, Mistral, DeepSeek, and more — through a single OpenAI-compatible API. One key, one endpoint, every frontier model.
Here's everything you need to know.
What OpenRouter Is
OpenRouter is an API aggregator. You make a request to https://openrouter.ai/api/v1/chat/completions, specify any model you want, and OpenRouter routes the request to the appropriate provider, handles authentication, and returns the response.
From your code's perspective, you call one endpoint. OpenRouter handles:
- Authentication with each provider
- Request/response format normalization
- Fallback if a model is unavailable
- Billing (you pay OpenRouter, not each provider separately)
How It Works
Your App → OpenRouter API → Anthropic API → Response
↓ (fallback)
→ OpenAI API → Response
Pricing
OpenRouter's pricing is provider price + a small markup. For most models:
- Most models: exact provider price or 0-10% markup
- Some specialized models: up to 15% markup
- Free models: yes, some models are free (with rate limits)
Sample Pricing Comparison
| Model | Direct Provider Price | OpenRouter Price |
| claude-sonnet-4-5 | $3.00 / $15.00 | $3.00 / $15.00 |
| gpt-4o | $2.50 / $10.00 | $2.50 / $10.00 |
| gemini-2.5-pro | $1.25 / $10.00 | $1.25 / $10.00 |
| deepseek/deepseek-r1 | $0.55 / $2.19 | $0.55 / $2.19 |
| meta-llama/llama-3.3-70b | $0.59 / $0.79 | $0.59 / $0.79 |
For the most popular models, OpenRouter charges no markup — they make money on volume and on smaller/newer models.
Free tier: New accounts get $1 credit. Some models are free with rate limits (good for testing).
Getting Started
1. Get an API Key
Sign up at openrouter.ai → API Keys → Create Key
2. Make Your First Call
from openai import OpenAI
client = OpenAI(
api_key="sk-or-v1-your-openrouter-key",
base_url="https://openrouter.ai/api/v1"
)
response = client.chat.completions.create(
model="anthropic/claude-sonnet-4-5",
messages=[
{"role": "user", "content": "Explain transformer attention in 2 paragraphs."}
]
)
print(response.choices[0].message.content)
print(f"Cost: ${response.usage.total_cost:.6f}")
import OpenAI from 'openai';
const client = new OpenAI({
apiKey: process.env.OPENROUTER_API_KEY,
baseURL: 'https://openrouter.ai/api/v1',
});
const response = await client.chat.completions.create({
model: 'anthropic/claude-sonnet-4-5',
messages: [{ role: 'user', content: 'Hello' }],
});
console.log(response.choices[0].message.content);
Required Headers
OpenRouter requires identifying your application:
client = OpenAI(
api_key="sk-or-v1-your-key",
base_url="https://openrouter.ai/api/v1",
default_headers={
"HTTP-Referer": "https://yourapp.com", # Required
"X-Title": "Your App Name" # Required
}
)
Without these headers, some models may be unavailable.
Model Naming Convention
OpenRouter uses provider/model-name format:
# Major providers
"anthropic/claude-sonnet-4-5"
"anthropic/claude-opus-4"
"openai/gpt-4o"
"openai/o3-mini"
"google/gemini-2.5-pro"
"google/gemini-2.0-flash"
"deepseek/deepseek-r1"
"meta-llama/llama-3.3-70b-instruct"
"mistralai/mistral-large"
"cohere/command-r-plus"
# Free models (rate-limited)
"google/gemma-2-9b-it:free"
"mistralai/mistral-7b-instruct:free"
"meta-llama/llama-3.2-11b-vision-instruct:free"
Key Features
1. Automatic Fallbacks
If your primary model is unavailable, OpenRouter can automatically fall back:
response = client.chat.completions.create(
model="anthropic/claude-sonnet-4-5",
messages=[{"role": "user", "content": "Hello"}],
extra_body={
"models": [
"anthropic/claude-sonnet-4-5",
"openai/gpt-4o",
"google/gemini-2.5-pro"
]
}
)
# Find out which model actually responded
print(f"Model used: {response.model}")
2. The auto Router
Let OpenRouter automatically select the best model for cost/quality:
response = client.chat.completions.create(
model="openrouter/auto", # OpenRouter chooses based on your prompt
messages=[{"role": "user", "content": "What is 2+2?"}]
)
# Will likely route to a cheap model — appropriate for simple questions
3. Provider Routing Preferences
# Prefer specific providers for a model (some models are hosted by multiple providers)
response = client.chat.completions.create(
model="meta-llama/llama-3.3-70b-instruct",
messages=[...],
extra_body={
"provider": {
"order": ["Groq", "Together", "Fireworks"], # Try in order
"allow_fallbacks": True
}
}
)
4. Cost Tracking
response = client.chat.completions.create(
model="anthropic/claude-sonnet-4-5",
messages=[{"role": "user", "content": "Write a poem"}]
)
# OpenRouter adds cost info to usage
if hasattr(response, 'usage'):
print(f"Input tokens: {response.usage.prompt_tokens}")
print(f"Output tokens: {response.usage.completion_tokens}")
print(f"Total cost: ${response.usage.total_cost:.6f}")
5. Streaming
stream = client.chat.completions.create(
model="anthropic/claude-sonnet-4-5",
messages=[{"role": "user", "content": "Write a long essay about space exploration"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
6. Vision / Multimodal
import base64
# Read and encode image
with open("screenshot.png", "rb") as f:
image_data = base64.b64encode(f.read()).decode()
response = client.chat.completions.create(
model="anthropic/claude-sonnet-4-5", # or openai/gpt-4o, google/gemini-2.5-pro
messages=[{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {"url": f"data:image/png;base64,{image_data}"}
},
{"type": "text", "text": "What's in this image?"}
]
}]
)
OpenRouter vs Direct API: When Each Wins
Use OpenRouter when:
1. You want model flexibility without code changes:
# Change one line to switch models
model = "anthropic/claude-sonnet-4-5" # Change this to try any model
model = "google/gemini-2.5-pro" # No code changes needed
2. You want automatic fallbacks without building infrastructure: Building a fallback chain yourself requires managing multiple API clients, error handling, and format normalization. OpenRouter handles all of this.
3. You want to try models from providers you don't have accounts with: OpenRouter aggregates billing — one account gives you access to 100+ models.
4. You're prototyping and want to compare models: Run the same prompt against 5 models instantly.
Use Direct API when:
1. Volume is high and markup matters: For very high volume, even a small percentage markup adds up. At $10K/month in API spend, a 5% markup is $500/month.
2. You need specific enterprise features: Azure OpenAI's provisioned throughput, Anthropic's enterprise DPA, or Google's specific compliance certifications aren't available via OpenRouter.
3. You have strict data residency requirements: OpenRouter's servers see your requests. For GDPR-sensitive data, using a provider directly (especially Azure or Vertex with EU region) is cleaner.
4. You need the lowest possible latency: OpenRouter adds a small network hop. For real-time applications, direct API is marginally faster.
Building a Model Comparison Tool with OpenRouter
async def compare_models(
prompt: str,
models: list[str] = [
"anthropic/claude-sonnet-4-5",
"openai/gpt-4o",
"google/gemini-2.5-pro"
]
):
"""Run the same prompt across multiple models and compare."""
import asyncio
from openai import AsyncOpenAI
client = AsyncOpenAI(
api_key=os.environ["OPENROUTER_API_KEY"],
base_url="https://openrouter.ai/api/v1",
default_headers={
"HTTP-Referer": "https://llmversus.com",
"X-Title": "LLM Comparison"
}
)
async def call_model(model: str) -> dict:
import time
start = time.time()
response = await client.chat.completions.create(
model=model,
messages=[{"role": "user", "content": prompt}],
max_tokens=1024
)
latency = time.time() - start
return {
"model": model,
"response": response.choices[0].message.content,
"latency": f"{latency:.2f}s",
"cost": f"${response.usage.total_cost:.4f}" if hasattr(response.usage, 'total_cost') else "N/A"
}
results = await asyncio.gather(*[call_model(m) for m in models])
return results
# Usage
import asyncio
results = asyncio.run(compare_models(
"Explain the difference between supervised and unsupervised learning."
))
for r in results:
print(f"\n=== {r['model']} (latency: {r['latency']}, cost: {r['cost']}) ===")
print(r['response'][:300])
Rate Limits
OpenRouter rate limits depend on your credit balance and usage tier:
| Tier | Requests/Minute | Notes |
| Free | 20 | Limited models |
| Pay-as-you-go | 200 | All models |
| High-usage | 1,000+ | Contact for enterprise |
For production applications, you'll want to implement retry logic regardless.
The Verdict
OpenRouter is the right choice when:
- You're building and want to try multiple models without managing multiple accounts
- You want resilient fallback behavior without building infrastructure
- Your volume is moderate (under ~$5K/month in API spend)
- Model flexibility is more important than squeezing every dollar of margin
For large-scale production applications with strict data handling requirements, direct provider APIs (especially via cloud provider offerings) will serve you better. But for most developers building LLM applications, OpenRouter is the fastest path to getting started with production-grade model access.
Explore all available models at openrouter.ai/models — the catalog updates frequently as new models launch.
Methodology
All data in this article sourced from publicly available provider documentation and pricing pages, verified 2026-04-16. Performance benchmarks from LMSYS Chatbot Arena and independent API tests. Costs listed as per-million-token input/output unless noted.