Cursor Custom Models: Add Any LLM via API Key in 2026
Last updated: April 15, 2026
Cursor Custom Models
Cursor Pro ships with a catalog of premium models: GPT-4o, Claude Sonnet, Claude Opus, Gemini 2.5 Pro, o1, and about a dozen others. For most developers the built-in catalog is enough. Custom models matter when you want to use your own API key, run a model Cursor does not ship, meet compliance requirements, or cut cost at scale. This page covers the four custom model paths: OpenAI-compatible, Anthropic, Azure OpenAI, and local via Ollama or LM Studio.
What Cursor ships out of the box
Under Settings > Models you see the default catalog. Free users get Cursor Small and limited premium access. Pro users at $20 per month get unlimited Cursor Small, 500 fast premium requests, and unlimited slow premium requests. Business at $40 per seat adds org controls and privacy enforcement.
For 80% of developers the built-in catalog is the right choice. The premium request pool covers normal use, and you do not have to manage API keys. Custom models start to make sense when you hit one of four cases.
Why go custom
Four reasons to add a custom model:
- Bring your own key. You already pay OpenAI or Anthropic directly and want to route Cursor to the same account.
- Use a model Cursor does not ship. A niche code model, a preview snapshot, a fine-tuned variant.
- Compliance. Your org requires that API calls go through Azure OpenAI, a private Anthropic AWS endpoint, or a self-hosted proxy.
- Cost control at scale. Heavy users can sometimes beat Cursor's $20 flat fee with direct API billing, especially when most prompts are short.
If none of those apply, skip this page. Cursor's built-in pool is cheaper than managing your own billing.
Adding a custom model
Open Settings and go to Models. At the bottom of the list click Add Custom Model. A dialog asks for three fields:
- Model name: the identifier the provider expects (
gpt-4o-2024-11-20,claude-sonnet-4-5-20250929,llama3.1:70b) - Base URL: the API endpoint (
https://api.openai.com/v1,http://localhost:11434/v1) - API key: your key for that endpoint
Cursor uses the OpenAI chat completions format by default, so any provider that speaks that format works. Click Save, then Test. A successful test returns a completion in about 2 seconds. A failure returns an error code.
OpenAI-compatible endpoints
The OpenAI format is the de facto standard. Providers that accept it:
- OpenAI directly
- Azure OpenAI (with a small URL adjustment)
- Together.ai, Fireworks, Groq, DeepInfra, OpenRouter
- vLLM servers, LM Studio, Ollama
- Most self-hosted inference stacks
For each, the base URL and model name differ but the format is the same. OpenRouter in particular is handy because it aggregates hundreds of models under one endpoint, so one OpenRouter key gets you access to models from every major provider.
Bring your own Anthropic key
Cursor has first-class Anthropic support but not via the OpenAI-compatible layer for every feature. The cleanest path:
- Create a key at console.anthropic.com
- Settings > Models > Anthropic API Key
- Paste the key and save
Once set, Cursor routes Claude requests to your account. Billing goes to Anthropic directly rather than counting against your Cursor premium quota. This is the right setup when you already have an Anthropic bill or want finer cost tracking.
Known limitation: Cursor Tab completions still run on Cursor's infrastructure because Tab uses a custom fine-tuned model, not Claude. Your Anthropic key covers Composer and Chat only.
Azure OpenAI
Enterprise Cursor customers often need Azure OpenAI for data residency and compliance. Setup:
- Deploy your model in Azure (e.g.
gpt-4oin East US) - Grab the endpoint URL from the Azure portal
- In Cursor: Add Custom Model
- Model name: the deployment name, not the model name
- Base URL:
https://.openai.azure.com/openai/deployments/ ?api-version=2024-10-21 - API key: the Azure OpenAI resource key
The URL format is a common gotcha. Azure requires the deployment name in the path and the API version as a query string. Cursor passes through whatever URL you set, so both pieces must be correct.
Local models via Ollama
Running models locally gives you data privacy and zero per-request cost. Trade-off: slower and weaker than frontier models.
Install Ollama from ollama.com. Pull a code model:
ollama pull qwen2.5-coder:32b
ollama pull deepseek-coder-v2:16b
ollama pull llama3.1:70b
Ollama exposes an OpenAI-compatible endpoint at http://localhost:11434/v1. In Cursor:
- Model name:
qwen2.5-coder:32b - Base URL:
http://localhost:11434/v1 - API key: anything (Ollama does not check; Cursor requires a non-empty string)
Test with a quick Chat prompt. Expect 2-10 seconds per response depending on your hardware. On an M3 Max with 64GB of RAM, a 32B model runs at about 25 tokens per second.
Local models via LM Studio
LM Studio is an alternative to Ollama with a GUI. It also exposes an OpenAI-compatible endpoint, usually at http://localhost:1234/v1.
Pick a model from the LM Studio catalog, download it, start the local server. Then add it to Cursor with the endpoint and the model name that LM Studio reports.
LM Studio is friendlier for users who do not want to touch a terminal. Ollama is friendlier for users who want to script the setup. Output quality is identical for the same model; it is the same GGUF weights under the hood.
Model selection per feature
Under Settings > Models you can pick a different model for each AI feature:
- Tab completions: Cursor Small is the default; custom OpenAI-compatible models work but the fine-tuned Cursor model is usually better at this task
- Cmd+K inline edits: any model works; Sonnet is the default
- Chat: any model
- Composer (normal mode): any model; Sonnet is the default
- Composer (agent mode): tool-calling capable models only - GPT-4o, Sonnet, Opus, Gemini 2.5 Pro
For agent mode you need a model that supports function calling. Many open-source models do not handle tool calls well, so agent mode with a local Ollama model often breaks. Stick to frontier models for agent work.
Cost comparison
When is bringing your own key cheaper than Cursor Pro?
At API prices in April 2026:
- Sonnet 4.5: $3 per million input tokens, $15 per million output
- GPT-4o: $2.50 per million input, $10 per million output
- Gemini 2.5 Pro: $1.25 per million input, $5 per million output
A typical Composer session uses 50-100k input tokens and 5-10k output tokens. That is about $0.30 per session on Sonnet. Cursor Pro at $20 per month covers 500 fast requests, which is around $30-60 of direct API cost. So Cursor Pro is usually cheaper for individuals.
The math flips for heavy users who make 2000+ requests per month. At that volume, direct API billing on Gemini or a batch-discounted Azure deployment can run under $30 per user per month.
When custom models make sense
Summary matrix:
- Individual developer, moderate use: Cursor Pro, no custom models
- Individual developer, heavy use: Cursor Pro, optionally add an Anthropic key for overflow
- Enterprise with compliance needs: Azure OpenAI or self-hosted proxy
- Privacy-sensitive work on non-sensitive codebase: local via Ollama for personal projects, Cursor for work
- Cost-conscious team: Cursor Business with shared pool usually beats per-seat custom keys
Limitations of local models
Local models have real limits as of April 2026:
- Context window: most local models top out at 32k or 128k, well below Sonnet's 200k
- Code quality: Qwen 2.5 Coder 32B is the current best open model and trails Sonnet by 10-15% on real coding benchmarks
- Tool calling: hit-or-miss; agent mode often fails
- Speed: 10-40 tokens per second on consumer hardware, vs 60-100 on the hosted frontier models
For code review, explaining existing code, and small edits, local models are adequate. For multi-file Composer work on a production codebase, the frontier models still win.
Switching models mid-session
Press Cmd+/ on macOS or Ctrl+/ on Windows to open the model picker. Select a different model; the session continues with the new model active. The conversation history carries over, so the new model picks up where the old one left off.
Use this when a task changes character. Start a Composer session with Sonnet for speed, switch to Opus when the remaining work needs better reasoning, switch back to Sonnet to finish the cleanup.
Frequently asked questions
Can I use my own OpenAI API key in Cursor?
Yes. Settings > Models > Add Custom Model. Set the base URL to https://api.openai.com/v1 and paste your key. Cursor routes requests to your account and bills go to OpenAI directly.
Does Cursor support Ollama?
Yes. Ollama exposes an OpenAI-compatible endpoint at http://localhost:11434/v1. Add a custom model with that URL, the model name you pulled, and any non-empty string as the API key.
Can I run Cursor entirely offline?
Partial. The editor runs offline. Tab, Cmd+K, Composer, and Chat need an AI endpoint. Point them at a local Ollama or LM Studio server to stay fully offline for AI calls too, but tool-calling agent mode often breaks on local models.
Does my Anthropic API key replace Cursor's Claude access?
For Composer and Chat, yes. Cursor Tab completions still run on Cursor's infrastructure because Tab uses a custom fine-tuned model, not Claude, so the key does not affect that feature.
Is using my own API key cheaper than Cursor Pro?
Usually not for individuals. Pro at $20 per month covers more than $30 of direct API cost at typical Sonnet usage. For teams running 2000+ requests per user per month, direct billing on Gemini or Azure can beat the flat fee.
How do I pick different models per feature in Cursor?
Settings > Models shows each feature (Tab, Cmd+K, Chat, Composer) with its own model dropdown. Tab usually stays on Cursor Small; Composer on Sonnet or Opus; Chat on whichever you prefer.