OpenRouter Complete Guide 2026: Access 100+ LLMs With One API

OpenRouter lets you access 100+ language models — OpenAI, Anthropic, Google, Meta, Mistral, DeepSeek, and more — through a single OpenAI-compatible API. One key, one endpoint, every frontier model.

Here's everything you need to know.

What OpenRouter Is

OpenRouter is an API aggregator. You make a request to https://openrouter.ai/api/v1/chat/completions, specify any model you want, and OpenRouter routes the request to the appropriate provider, handles authentication, and returns the response.

From your code's perspective, you call one endpoint. OpenRouter handles:

Authentication with each provider
Request/response format normalization
Fallback if a model is unavailable
Billing (you pay OpenRouter, not each provider separately)

How It Works

Your App → OpenRouter API → Anthropic API → Response
                         ↓ (fallback)
                         → OpenAI API → Response

Pricing

OpenRouter's pricing is provider price + a small markup. For most models:

Most models: exact provider price or 0-10% markup
Some specialized models: up to 15% markup
Free models: yes, some models are free (with rate limits)

Sample Pricing Comparison

Model

Direct Provider Price

OpenRouter Price

claude-sonnet-4-5	$3.00 / $15.00	$3.00 / $15.00
gpt-4o	$2.50 / $10.00	$2.50 / $10.00
gemini-2.5-pro	$1.25 / $10.00	$1.25 / $10.00
deepseek/deepseek-r1	$0.55 / $2.19	$0.55 / $2.19
meta-llama/llama-3.3-70b	$0.59 / $0.79	$0.59 / $0.79

For the most popular models, OpenRouter charges no markup — they make money on volume and on smaller/newer models.

Free tier: New accounts get $1 credit. Some models are free with rate limits (good for testing).

Getting Started

1. Get an API Key

2. Make Your First Call

from openai import OpenAI

client = OpenAI(
    api_key="sk-or-v1-your-openrouter-key",
    base_url="https://openrouter.ai/api/v1"
)

response = client.chat.completions.create(
    model="anthropic/claude-sonnet-4-5",
    messages=[
        {"role": "user", "content": "Explain transformer attention in 2 paragraphs."}
    ]
)

print(response.choices[0].message.content)
print(f"Cost: ${response.usage.total_cost:.6f}")

import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: process.env.OPENROUTER_API_KEY,
  baseURL: 'https://openrouter.ai/api/v1',
});

const response = await client.chat.completions.create({
  model: 'anthropic/claude-sonnet-4-5',
  messages: [{ role: 'user', content: 'Hello' }],
});

console.log(response.choices[0].message.content);

Required Headers

OpenRouter requires identifying your application:

client = OpenAI(
    api_key="sk-or-v1-your-key",
    base_url="https://openrouter.ai/api/v1",
    default_headers={
        "HTTP-Referer": "https://yourapp.com",  # Required
        "X-Title": "Your App Name"              # Required
    }
)

Without these headers, some models may be unavailable.

Model Naming Convention

OpenRouter uses provider/model-name format:

# Major providers
"anthropic/claude-sonnet-4-5"
"anthropic/claude-opus-4"
"openai/gpt-4o"
"openai/o3-mini"
"google/gemini-2.5-pro"
"google/gemini-2.0-flash"
"deepseek/deepseek-r1"
"meta-llama/llama-3.3-70b-instruct"
"mistralai/mistral-large"
"cohere/command-r-plus"

# Free models (rate-limited)
"google/gemma-2-9b-it:free"
"mistralai/mistral-7b-instruct:free"
"meta-llama/llama-3.2-11b-vision-instruct:free"

Key Features

1. Automatic Fallbacks

If your primary model is unavailable, OpenRouter can automatically fall back:

response = client.chat.completions.create(
    model="anthropic/claude-sonnet-4-5",
    messages=[{"role": "user", "content": "Hello"}],
    extra_body={
        "models": [
            "anthropic/claude-sonnet-4-5",
            "openai/gpt-4o",
            "google/gemini-2.5-pro"
        ]
    }
)

# Find out which model actually responded
print(f"Model used: {response.model}")

2. The `auto` Router

Let OpenRouter automatically select the best model for cost/quality:

response = client.chat.completions.create(
    model="openrouter/auto",  # OpenRouter chooses based on your prompt
    messages=[{"role": "user", "content": "What is 2+2?"}]
)
# Will likely route to a cheap model — appropriate for simple questions

3. Provider Routing Preferences

# Prefer specific providers for a model (some models are hosted by multiple providers)
response = client.chat.completions.create(
    model="meta-llama/llama-3.3-70b-instruct",
    messages=[...],
    extra_body={
        "provider": {
            "order": ["Groq", "Together", "Fireworks"],  # Try in order
            "allow_fallbacks": True
        }
    }
)

4. Cost Tracking

response = client.chat.completions.create(
    model="anthropic/claude-sonnet-4-5",
    messages=[{"role": "user", "content": "Write a poem"}]
)

# OpenRouter adds cost info to usage
if hasattr(response, 'usage'):
    print(f"Input tokens: {response.usage.prompt_tokens}")
    print(f"Output tokens: {response.usage.completion_tokens}")
    print(f"Total cost: ${response.usage.total_cost:.6f}")

5. Streaming

stream = client.chat.completions.create(
    model="anthropic/claude-sonnet-4-5",
    messages=[{"role": "user", "content": "Write a long essay about space exploration"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

6. Vision / Multimodal

import base64

# Read and encode image
with open("screenshot.png", "rb") as f:
    image_data = base64.b64encode(f.read()).decode()

response = client.chat.completions.create(
    model="anthropic/claude-sonnet-4-5",  # or openai/gpt-4o, google/gemini-2.5-pro
    messages=[{
        "role": "user",
        "content": [
            {
                "type": "image_url",
                "image_url": {"url": f"data:image/png;base64,{image_data}"}
            },
            {"type": "text", "text": "What's in this image?"}
        ]
    }]
)

OpenRouter vs Direct API: When Each Wins

Use OpenRouter when:

1. You want model flexibility without code changes:

# Change one line to switch models
model = "anthropic/claude-sonnet-4-5"  # Change this to try any model
model = "google/gemini-2.5-pro"        # No code changes needed

2. You want automatic fallbacks without building infrastructure: Building a fallback chain yourself requires managing multiple API clients, error handling, and format normalization. OpenRouter handles all of this.

3. You want to try models from providers you don't have accounts with: OpenRouter aggregates billing — one account gives you access to 100+ models.

4. You're prototyping and want to compare models: Run the same prompt against 5 models instantly.

Use Direct API when:

1. Volume is high and markup matters: For very high volume, even a small percentage markup adds up. At $10K/month in API spend, a 5% markup is $500/month.

2. You need specific enterprise features: Azure OpenAI's provisioned throughput, Anthropic's enterprise DPA, or Google's specific compliance certifications aren't available via OpenRouter.

3. You have strict data residency requirements: OpenRouter's servers see your requests. For GDPR-sensitive data, using a provider directly (especially Azure or Vertex with EU region) is cleaner.

4. You need the lowest possible latency: OpenRouter adds a small network hop. For real-time applications, direct API is marginally faster.

Building a Model Comparison Tool with OpenRouter

async def compare_models(
    prompt: str,
    models: list[str] = [
        "anthropic/claude-sonnet-4-5",
        "openai/gpt-4o",
        "google/gemini-2.5-pro"
    ]
):
    """Run the same prompt across multiple models and compare."""
    import asyncio
    from openai import AsyncOpenAI
    
    client = AsyncOpenAI(
        api_key=os.environ["OPENROUTER_API_KEY"],
        base_url="https://openrouter.ai/api/v1",
        default_headers={
            "HTTP-Referer": "https://llmversus.com",
            "X-Title": "LLM Comparison"
        }
    )
    
    async def call_model(model: str) -> dict:
        import time
        start = time.time()
        response = await client.chat.completions.create(
            model=model,
            messages=[{"role": "user", "content": prompt}],
            max_tokens=1024
        )
        latency = time.time() - start
        return {
            "model": model,
            "response": response.choices[0].message.content,
            "latency": f"{latency:.2f}s",
            "cost": f"${response.usage.total_cost:.4f}" if hasattr(response.usage, 'total_cost') else "N/A"
        }
    
    results = await asyncio.gather(*[call_model(m) for m in models])
    return results

# Usage
import asyncio
results = asyncio.run(compare_models(
    "Explain the difference between supervised and unsupervised learning."
))

for r in results:
    print(f"\n=== {r['model']} (latency: {r['latency']}, cost: {r['cost']}) ===")
    print(r['response'][:300])

Rate Limits

OpenRouter rate limits depend on your credit balance and usage tier:

Tier

Requests/Minute

Notes

Free	20	Limited models
Pay-as-you-go	200	All models
High-usage	1,000+	Contact for enterprise

For production applications, you'll want to implement retry logic regardless.

The Verdict

OpenRouter is the right choice when:

You're building and want to try multiple models without managing multiple accounts
You want resilient fallback behavior without building infrastructure
Your volume is moderate (under ~$5K/month in API spend)
Model flexibility is more important than squeezing every dollar of margin

For large-scale production applications with strict data handling requirements, direct provider APIs (especially via cloud provider offerings) will serve you better. But for most developers building LLM applications, OpenRouter is the fastest path to getting started with production-grade model access.

Explore all available models at openrouter.ai/models — the catalog updates frequently as new models launch.

Methodology

All data in this article sourced from publicly available provider documentation and pricing pages, verified 2026-04-16. Performance benchmarks from LMSYS Chatbot Arena and independent API tests. Costs listed as per-million-token input/output unless noted.