OpenAI API MCP Server
Call OpenAI chat, embeddings, DALL-E, and Whisper from inside a Claude Code workflow when a task needs a different model.
Updated: April 15, 2026
Install
{
"mcpServers": {
"openai-mcp": {
"command": "npx",
"args": [
"-y",
"mcp-openai"
],
"env": {
"OPENAI_API_KEY": "sk-proj-xxx"
}
}
}
}Capabilities
- + Call OpenAI chat completions with any available model (gpt-4o, gpt-4-turbo, gpt-3.5-turbo)
- + Generate embeddings with text-embedding-3-small or text-embedding-3-large
- + Create images with DALL-E 3 given a text prompt and size
- + Transcribe audio files using the Whisper endpoint
- + List all models available to the API key
- + Check current token usage and billing state
Limitations
- - Adds an extra API hop so latency is higher than calling OpenAI directly from your app
- - No streaming support; responses come back as a single chunk
- - Fine-tuned model lifecycle management (create, cancel, list jobs) is not exposed
- - Costs stack on top of your existing Claude usage since both APIs bill separately
OpenAI MCP server setup for Claude Code and Cursor
Quick answer: The OpenAI MCP server wraps the OpenAI API as MCP tools for Claude Code and Cursor. Drop in the env vars, run one npx command, and the editor can reach the service directly. Setup takes about 2 minutes, tested on mcp-openai@0.6.0 on April 15, 2026.
OpenAI offers chat models (GPT-4o and GPT-3.5-turbo), embeddings, image generation (DALL-E 3), and speech models (Whisper, TTS) behind a single API key. Without an MCP connection, working with OpenAI means flipping between the editor and the web UI - copying IDs, pasting results, losing context. The MCP server removes that loop. Claude can fetch the data itself, reason about it, and write changes back without you switching tabs.
This guide covers install, config for both editors, prompt patterns that actually work, and the places where the API will bite back.
What this server does
The server speaks MCP over stdio and wraps the standard OpenAI SDK. The tool surface is grouped into these sets:
- Chat:
chat_completion,list_models - Embeddings:
create_embedding,create_batch_embeddings - Images:
create_image,create_image_variation - Audio:
transcribe_audio,translate_audio,generate_speech - Usage:
get_usage
Authentication uses OPENAI_API_KEY. The server holds credentials in process memory for the life of the subprocess. Nothing is written to disk by the server itself. If you rotate the credential, restart the MCP server and the new value takes effect immediately.
The server does not implement a local cache. Every tool call is a fresh round trip. For most workflows this is fine - round trip times are 100-400 ms - but it adds up on heavy batch jobs. For those, prefer the native SDK in a script.
Installing the server
The package ships on npm as mcp-openai. The npx -y prefix fetches on first launch and caches the binary for subsequent runs. Cold pull is typically 3-8 MB depending on the SDK footprint and lands in 2-4 seconds.
Before touching editor config, get your credentials ready:
- Sign in at platform.openai.com
- Navigate to API keys and click Create new secret key
- Scope the key to a project and name it
claude-mcp-dev - Copy the key; it starts with
sk-proj-and is shown only once
Keep the credential values out of any file you commit. The rest of this guide assumes they live in your shell profile or a .envrc managed by direnv.
Configuring for Claude Code
Claude Code reads MCP servers from ~/.claude/mcp.json or a per-project .mcp.json. Add a openai entry:
{
"mcpServers": {
"openai": {
"command": "npx",
"args": ["-y", "mcp-openai"],
"env": {
"OPENAI_API_KEY": "sk-proj-xxx"
}
}
}
}
Restart Claude Code, then run /mcp in a session to confirm OpenAI is attached. Call a read-only tool as a smoke test before any write operations. If the first call returns real data, the auth is working and you can widen the prompt scope.
For team projects, commit a placeholder version of .mcp.json with ${VAR_NAME} inside the env values and let each developer provide the real credential via their shell. Claude Code expands env vars when it spawns the subprocess.
Configuring for Cursor
Cursor uses the same MCP spec and reads from ~/.cursor/mcp.json. The config is identical to Claude Code:
{
"mcpServers": {
"openai": {
"command": "npx",
"args": ["-y", "mcp-openai"],
"env": {
"OPENAI_API_KEY": "sk-proj-xxx"
}
}
}
}
Open Cursor settings, navigate to the MCP tab, and toggle the server on. Cursor spawns the subprocess lazily on the first tool call. Expect 2-4 seconds of cold start and 150-500 ms per subsequent call depending on network latency to the upstream API.
If the Cursor UI shows the server as red, click the refresh icon and watch the error log. Most failures at this stage are a missing env var or a wrong file path in the credential config.
Example prompts and workflows
A few prompts that work reliably once the server is attached:
- "Use gpt-4o to rewrite the README I just pasted into three bullet points."
- "Generate an embedding for the phrase
machine learning tutorialusing text-embedding-3-small." - "Create a DALL-E image at 1024x1024 of
a teal robot holding a clipboard in a cozy cafe." - "Transcribe the audio file at
/tmp/call.mp3and return the plain text." - "List every GPT-4 model available to my key."
The model will chain calls. A comparison flow might run chat_completion with gpt-4o on the same prompt Claude just answered, then compare both outputs. Tell Claude which model to use up front - generic prompts land on gpt-3.5-turbo (cheaper, weaker) by default.
One pattern that saves calls: narrow the scope up front. Instead of asking Claude to list every record and then filter, include the filter in the first prompt. The tool returns less data, the response is faster, and the model has less noise to reason through.
Troubleshooting
Tool call returns 401. Key is wrong or disabled. Regenerate at platform.openai.com/api-keys.
Tool call returns 429. Rate limit hit. Free tier is 3 req/min on chat. Paid tier depends on usage; add a short delay or upgrade the tier in the OpenAI billing settings.
Image generation returns a safety rejection. DALL-E blocked the prompt. Rephrase to avoid people, brands, or violent themes. Real-person names and copyrighted characters are auto-blocked.
Audio transcription times out. File is too large (limit 25 MB per request). Split the audio with ffmpeg into 10-minute chunks and transcribe each separately, then concatenate.
Costs higher than expected. Embeddings against large corpora and DALL-E calls stack fast. Check get_usage daily during development and set a hard spending limit in the OpenAI billing dashboard.
Server fails with ENOENT. npx is not on PATH in the env the editor inherits. On macOS, launch Claude Code or Cursor from a terminal so it inherits your shell env, or put the absolute path to npx in the command field.
Subprocess keeps restarting. The MCP transport is strict about newlines on stdio. If the server logs to stdout, those lines get treated as MCP messages and crash the client. Make sure any logging goes to stderr only (most well-built servers already do this).
Alternatives
A few options if the OpenAI server does not fit your setup:
anthropic-mcpwraps the Claude API for cross-model work when you want to compare Claude outputs.openrouter-mcplets you call OpenAI, Anthropic, and open-source models through one key.replicate-mcpruns open-source models for teams that need local-friendly or fine-tuned variants.
The MCP server pays off when a Claude Code workflow needs a second model for cross-verification, DALL-E images, or Whisper transcription inside the same agent session.
Performance notes and hardening
Steady-state call latency lands in the 150-500 ms range for most tools. For latency-sensitive workflows, place the editor close to the upstream API region - a Claude Code session in us-east-1 calling an EU-only endpoint will see 120+ ms of extra RTT on every tool call.
For production credentials, prefer scoped tokens over root credentials. Most services expose fine-grained permission models; use them. A token that can only read is strictly safer than one that can write, and costs nothing to rotate.
Log review is easier if you redirect MCP subprocess stderr to a file. Most editors do this by default, but not all surface the log path. On macOS, check ~/Library/Logs/Claude/ or the Cursor equivalent.
The OpenAI MCP server is the right default for any workflow that already touches OpenAI regularly. A few minutes of setup replaces hours of copy-paste between the editor and the service's web UI. Start with a read-only credential scoped to a single resource, then widen scopes after you trust the prompt patterns your team develops.
Guides
Frequently asked questions
Why call OpenAI from inside Claude Code?
A few real cases: cross-verify a Claude answer with a second model, use DALL-E for image generation (Claude does not make images), transcribe audio with Whisper (no Claude equivalent), and run embeddings if your existing RAG pipeline is built on OpenAI embeddings.
Does it support streaming responses?
Not yet. The server buffers the full response before returning. For long generations, that adds latency. Streaming support is on the roadmap but requires changes in how the MCP transport handles partial messages.
Can I use a project-scoped key?
Yes, and you should. Create keys at the project level under platform.openai.com/settings and scope to a specific project. That way usage reporting and cost limits apply cleanly per use case.
How do costs compare to Claude?
Roughly comparable for text. gpt-4o is $2.50 per 1M input tokens; Claude 3.7 Sonnet is $3 per 1M input. DALL-E and Whisper have no Claude equivalent so costs are additive. Set a monthly spending cap in the OpenAI billing dashboard to avoid surprises.
What about function calling or tools?
Chat completions support the OpenAI `tools` parameter but the MCP tool surface does not map tool-call loops back to Claude. For agent-style workflows, keep that logic in Claude and use this server for one-shot completions.
Can I use Azure OpenAI through this server?
Not directly. Azure has a different endpoint and auth model. Use `azure-openai-mcp` or set `OPENAI_API_BASE` and `OPENAI_API_VERSION` env vars if the server supports overrides in your version.