Anthropic API MCP Server
Call different Claude models from inside a Claude Code workflow for model comparison, batch generation, or cheap sub-tasks.
Updated: April 15, 2026
Install
{
"mcpServers": {
"anthropic-mcp": {
"command": "npx",
"args": [
"-y",
"mcp-anthropic"
],
"env": {
"ANTHROPIC_API_KEY": "sk-ant-xxx"
}
}
}
}Capabilities
- + Call any Claude model available to the API key from within a Claude Code workflow
- + Generate text with different Claude models (Opus, Sonnet, Haiku)
- + Compare outputs from multiple models on the same prompt
- + Batch process text generation tasks without manual chaining
Limitations
- - Calling Claude from Claude adds latency and API cost; only useful for specific sub-tasks
- - No streaming support; responses buffer fully before returning
- - Model version selection required; the server does not auto-pick a model
- - Context window limits still apply per call; 200K tokens is the current ceiling
Anthropic Claude MCP server setup for Claude Code and Cursor
Quick answer: The Anthropic Claude MCP server wraps the Anthropic Claude API as MCP tools for Claude Code and Cursor. Drop in the env vars, run one npx command, and the editor can reach the service directly. Setup takes about 2 minutes, tested on mcp-anthropic@0.3.4 on April 15, 2026.
The Anthropic API serves Claude Opus, Sonnet, and Haiku under a single key. Each model has different cost, latency, and capability tradeoffs useful for different sub-tasks. Without an MCP connection, working with Anthropic Claude means flipping between the editor and the web UI - copying IDs, pasting results, losing context. The MCP server removes that loop. Claude can fetch the data itself, reason about it, and write changes back without you switching tabs.
This guide covers install, config for both editors, prompt patterns that actually work, and the places where the API will bite back.
What this server does
The server speaks MCP over stdio and wraps the standard Anthropic Claude SDK. The tool surface is grouped into these sets:
- Messages:
create_message,create_message_stream(buffered),list_models - Batch:
create_batch,get_batch_status,list_batches - Count:
count_tokens
Authentication uses ANTHROPIC_API_KEY. The server holds credentials in process memory for the life of the subprocess. Nothing is written to disk by the server itself. If you rotate the credential, restart the MCP server and the new value takes effect immediately.
The server does not implement a local cache. Every tool call is a fresh round trip. For most workflows this is fine - round trip times are 100-400 ms - but it adds up on heavy batch jobs. For those, prefer the native SDK in a script.
Installing the server
The package ships on npm as mcp-anthropic. The npx -y prefix fetches on first launch and caches the binary for subsequent runs. Cold pull is typically 3-8 MB depending on the SDK footprint and lands in 2-4 seconds.
Before touching editor config, get your credentials ready:
- Sign in at console.anthropic.com
- Open the API Keys page under Settings
- Click Create Key, name it
claude-mcp-dev, and pick a workspace - Copy the key; it starts with
sk-ant-and is shown once
Keep the credential values out of any file you commit. The rest of this guide assumes they live in your shell profile or a .envrc managed by direnv.
Configuring for Claude Code
Claude Code reads MCP servers from ~/.claude/mcp.json or a per-project .mcp.json. Add a anthropic entry:
{
"mcpServers": {
"anthropic": {
"command": "npx",
"args": ["-y", "mcp-anthropic"],
"env": {
"ANTHROPIC_API_KEY": "sk-ant-xxx"
}
}
}
}
Restart Claude Code, then run /mcp in a session to confirm Anthropic Claude is attached. Call a read-only tool as a smoke test before any write operations. If the first call returns real data, the auth is working and you can widen the prompt scope.
For team projects, commit a placeholder version of .mcp.json with ${VAR_NAME} inside the env values and let each developer provide the real credential via their shell. Claude Code expands env vars when it spawns the subprocess.
Configuring for Cursor
Cursor uses the same MCP spec and reads from ~/.cursor/mcp.json. The config is identical to Claude Code:
{
"mcpServers": {
"anthropic": {
"command": "npx",
"args": ["-y", "mcp-anthropic"],
"env": {
"ANTHROPIC_API_KEY": "sk-ant-xxx"
}
}
}
}
Open Cursor settings, navigate to the MCP tab, and toggle the server on. Cursor spawns the subprocess lazily on the first tool call. Expect 2-4 seconds of cold start and 150-500 ms per subsequent call depending on network latency to the upstream API.
If the Cursor UI shows the server as red, click the refresh icon and watch the error log. Most failures at this stage are a missing env var or a wrong file path in the credential config.
Example prompts and workflows
A few prompts that work reliably once the server is attached:
- "Use Claude Opus to review this design doc and flag the top 3 concerns."
- "Call Haiku on each of these 20 support tickets and classify by category."
- "Run the same prompt on Opus and Sonnet and show both answers side by side."
- "Count the tokens in this file before I send it as context."
- "Create a batch job to generate a short abstract for every blog post in this folder."
The model will chain calls. A classification pipeline usually runs create_message with Haiku on each row, then aggregates results in the parent Claude session. Use Haiku for high-volume classification, Opus for hard reasoning, and Sonnet as the default middle ground. Batch API is cheapest when you can wait 24 hours for results.
One pattern that saves calls: narrow the scope up front. Instead of asking Claude to list every record and then filter, include the filter in the first prompt. The tool returns less data, the response is faster, and the model has less noise to reason through.
Troubleshooting
Tool call returns 401. API key is wrong or disabled. Check at console.anthropic.com > API Keys and rotate if needed.
Tool call returns 429. Rate limit on your tier. Build tier starts at 50 req/min for Sonnet; production tier is higher. Stagger calls or request a tier bump.
Overloaded error during peak. Anthropic occasionally returns 529 overloaded during peak demand. Retry after 10-30 seconds with exponential backoff. The error is transient and clears within minutes.
Batch job stuck in processing. Batch jobs have a 24-hour SLA but often finish in minutes. For stuck jobs, check get_batch_status periodically; there is no way to cancel a running batch, so start small until you trust the pipeline.
Token count differs between your count and the API response. Local tokenizers approximate; the server counts against the actual API tokenizer. Budget 5-10% headroom on context-window calculations to avoid surprise 413 errors.
Server fails with ENOENT. npx is not on PATH in the env the editor inherits. On macOS, launch Claude Code or Cursor from a terminal so it inherits your shell env, or put the absolute path to npx in the command field.
Subprocess keeps restarting. The MCP transport is strict about newlines on stdio. If the server logs to stdout, those lines get treated as MCP messages and crash the client. Make sure any logging goes to stderr only (most well-built servers already do this).
Alternatives
A few options if the Anthropic Claude server does not fit your setup:
openai-mcpwraps the OpenAI API; useful for cross-model comparison against GPT-4o.openrouter-mcpfans out across Claude, OpenAI, and open-source models through one key.mcp-anthropic-awscovers Claude hosted on AWS for teams already inside AWS IAM.
The MCP server pays off for multi-model workflows: classification with Haiku, heavy reasoning with Opus, and cross-model comparison on a design doc - all from one Claude Code session.
Performance notes and hardening
Steady-state call latency lands in the 150-500 ms range for most tools. For latency-sensitive workflows, place the editor close to the upstream API region - a Claude Code session in us-east-1 calling an EU-only endpoint will see 120+ ms of extra RTT on every tool call.
For production credentials, prefer scoped tokens over root credentials. Most services expose fine-grained permission models; use them. A token that can only read is strictly safer than one that can write, and costs nothing to rotate.
Log review is easier if you redirect MCP subprocess stderr to a file. Most editors do this by default, but not all surface the log path. On macOS, check ~/Library/Logs/Claude/ or the Cursor equivalent.
The Anthropic Claude MCP server is the right default for any workflow that already touches Anthropic Claude regularly. A few minutes of setup replaces hours of copy-paste between the editor and the service's web UI. Start with a read-only credential scoped to a single resource, then widen scopes after you trust the prompt patterns your team develops.
Guides
Frequently asked questions
Why call Claude from Claude?
A few good reasons: use Haiku for cheap high-volume classification while keeping Opus as the outer agent, compare outputs across models without leaving the editor, and batch-process long jobs overnight through the Batch API for 50% off list price.
Is there double billing?
Yes, in the sense that the outer Claude Code session bills normally and the inner API calls bill separately. For cost control, use Haiku for inner calls whenever possible and keep token budgets tight.
Does it support prompt caching?
Yes, set `cache_control` in the messages array. Cached prompts cost 10% of the normal input rate. The MCP tool accepts the same parameters as the raw API, so cache-friendly workflows map cleanly.
What about tool use and agents?
The tool surface passes the `tools` parameter through. For full agent loops, keep the orchestration in the parent Claude session and use the inner call for single-shot reasoning. Nested agent loops get confusing fast.
Can I use the Batch API?
Yes. The `create_batch` tool submits a batch job; `get_batch_status` polls for completion. Batch results are JSONL; download and parse in the outer Claude session. The 50% discount on batches makes large corpus processing cost-effective.
How do I pick the right model?
Start with Sonnet for most tasks. Switch to Opus only when you measurably need harder reasoning. Use Haiku for high-volume classification, short extraction tasks, and anywhere latency matters more than depth. Always measure on your own data before optimizing.