Does the server generate embeddings?

No. You pass vectors already computed. Generate them with OpenAI (`text-embedding-3-small`), Cohere, or Voyage AI first. Pinecone has an optional Inference API that embeds server-side, but that is not wired into this MCP tool set yet.

Can it switch between indexes at runtime?

Not without a restart. `PINECONE_INDEX` is read at startup. For multi-index workflows, run multiple MCP entries pointing at different indexes, each with a distinct name.

How much does a typical usage cost?

Serverless indexes charge per million vectors plus per million query reads. A small RAG index with 100k vectors and 1k queries a day costs about $3-5 per month. Pod-based starts at $70 per month for the smallest pod tier.

What is the max vector dimension?

20,480 dimensions on serverless indexes. Most embeddings are under 4,096. If you are using higher-dimensional experimental embeddings, confirm the max at index creation - you cannot change it later.

Does it support metadata filters in search?

Yes. Pass a `filter` object with operators like `$eq`, `$in`, and `$gte`. Metadata fields must be declared indexable at index creation time for the filter to work.

Can I use it with LangChain or LlamaIndex?

Those frameworks have their own Pinecone adapters that already run in-process. Use this MCP server when Claude itself (not your app) needs to talk to Pinecone - for interactive debugging, prompt iteration, or ad-hoc retrieval during a coding session.

Does the server generate embeddings?

No. You pass vectors already computed. Generate them with OpenAI (`text-embedding-3-small`), Cohere, or Voyage AI first. Pinecone has an optional Inference API that embeds server-side, but that is not wired into this MCP tool set yet.

Can it switch between indexes at runtime?

Not without a restart. `PINECONE_INDEX` is read at startup. For multi-index workflows, run multiple MCP entries pointing at different indexes, each with a distinct name.

How much does a typical usage cost?

Serverless indexes charge per million vectors plus per million query reads. A small RAG index with 100k vectors and 1k queries a day costs about $3-5 per month. Pod-based starts at $70 per month for the smallest pod tier.

What is the max vector dimension?

20,480 dimensions on serverless indexes. Most embeddings are under 4,096. If you are using higher-dimensional experimental embeddings, confirm the max at index creation - you cannot change it later.

Does it support metadata filters in search?

Yes. Pass a `filter` object with operators like `$eq`, `$in`, and `$gte`. Metadata fields must be declared indexable at index creation time for the filter to work.

Can I use it with LangChain or LlamaIndex?

Those frameworks have their own Pinecone adapters that already run in-process. Use this MCP server when Claude itself (not your app) needs to talk to Pinecone - for interactive debugging, prompt iteration, or ad-hoc retrieval during a coding session.

Run semantic vector search, upsert vectors, and inspect index stats from Claude Code or Cursor with a Pinecone API key.

Pinecone MCP server setup for Claude Code and Cursor

Quick answer: The Pinecone MCP server wraps the Pinecone API as MCP tools for Claude Code and Cursor. Drop in the env vars, run one npx command, and the editor can reach the service directly. Setup takes about 3 minutes, tested on mcp-pinecone@0.4.2 on April 15, 2026.

Pinecone is the vector database for production semantic search and retrieval-augmented generation. A single API key and index name cover the day-to-day surface. Without an MCP connection, working with Pinecone means flipping between the editor and the web UI - copying IDs, pasting results, losing context. The MCP server removes that loop. Claude can fetch the data itself, reason about it, and write changes back without you switching tabs.

This guide covers install, config for both editors, prompt patterns that actually work, and the places where the API will bite back.

What this server does

The server speaks MCP over stdio and wraps the standard Pinecone SDK. The tool surface is grouped into these sets:

Query: search, search_by_id, fetch_by_ids
Mutation: upsert, update, delete, delete_all_in_namespace
Index: describe_index, describe_index_stats, list_namespaces

Authentication uses PINECONE_API_KEY, PINECONE_INDEX. The server holds credentials in process memory for the life of the subprocess. Nothing is written to disk by the server itself. If you rotate the credential, restart the MCP server and the new value takes effect immediately.

The server does not implement a local cache. Every tool call is a fresh round trip. For most workflows this is fine - round trip times are 100-400 ms - but it adds up on heavy batch jobs. For those, prefer the native SDK in a script.

Installing the server

The package ships on npm as mcp-pinecone. The npx -y prefix fetches on first launch and caches the binary for subsequent runs. Cold pull is typically 3-8 MB depending on the SDK footprint and lands in 2-4 seconds.

Before touching editor config, get your credentials ready:

Sign in at app.pinecone.io
Create an index (serverless is cheapest for small workloads, start there)
Navigate to API Keys and create a new key scoped to the project
Copy the key; combined with the index name, that is all the server needs

Keep the credential values out of any file you commit. The rest of this guide assumes they live in your shell profile or a .envrc managed by direnv.

Configuring for Claude Code

Claude Code reads MCP servers from ~/.claude/mcp.json or a per-project .mcp.json. Add a pinecone entry:

{
  "mcpServers": {
    "pinecone": {
      "command": "npx",
      "args": ["-y", "mcp-pinecone"],
      "env": {
        "PINECONE_API_KEY": "your_api_key",
        "PINECONE_INDEX": "your-index-name"
      }
    }
  }
}

Restart Claude Code, then run /mcp in a session to confirm Pinecone is attached. Call a read-only tool as a smoke test before any write operations. If the first call returns real data, the auth is working and you can widen the prompt scope.

For team projects, commit a placeholder version of .mcp.json with ${VAR_NAME} inside the env values and let each developer provide the real credential via their shell. Claude Code expands env vars when it spawns the subprocess.

Configuring for Cursor

Cursor uses the same MCP spec and reads from ~/.cursor/mcp.json. The config is identical to Claude Code:

{
  "mcpServers": {
    "pinecone": {
      "command": "npx",
      "args": ["-y", "mcp-pinecone"],
      "env": {
        "PINECONE_API_KEY": "your_api_key",
        "PINECONE_INDEX": "your-index-name"
      }
    }
  }
}

Open Cursor settings, navigate to the MCP tab, and toggle the server on. Cursor spawns the subprocess lazily on the first tool call. Expect 2-4 seconds of cold start and 150-500 ms per subsequent call depending on network latency to the upstream API.

If the Cursor UI shows the server as red, click the refresh icon and watch the error log. Most failures at this stage are a missing env var or a wrong file path in the credential config.

Example prompts and workflows

A few prompts that work reliably once the server is attached:

"Search my index for the top 5 vectors closest to the embedding I pass below and show metadata."
"Upsert a vector with ID doc-42, the embedding I pass, and metadata {title: 'example', topic: 'intro'}."
"Describe index my-index and tell me the dimension, metric, and current vector count."
"Delete every vector in namespace stale-data."
"Fetch vectors with IDs doc-1, doc-2, doc-3 and return their metadata."

The model will chain calls. A RAG pipeline usually runs search with the query embedding, then fetch_by_ids to pull the full metadata, then passes the combined context back into Claude. Generate embeddings with OpenAI, Cohere, or Voyage before the call - the server does not embed on your behalf.

One pattern that saves calls: narrow the scope up front. Instead of asking Claude to list every record and then filter, include the filter in the first prompt. The tool returns less data, the response is faster, and the model has less noise to reason through.

Troubleshooting

Tool call returns 401. API key is wrong or the project was deleted. Regenerate at app.pinecone.io > API Keys and restart the server.

Search returns no results. Index is empty or the namespace is wrong. Run describe_index_stats and confirm the vector count and namespace map match what you expect.

Upsert fails with dimension mismatch. The vector dimension does not match what the index expects. If the index is dim=1536 (OpenAI ada-002), pass exactly 1536 floats per vector. Rebuild the index if you switched embedding models.

Serverless index slow on first query. Cold start latency. After 10+ minutes idle, the first query adds 500-1500 ms. For latency-sensitive workloads, pick a pod-based index or send a warmup query every 5 minutes.

Metadata filter ignored. Metadata fields need to be indexed explicitly when the index is created. Check the metadata config in describe_index. Unindexed fields work for storage but cannot filter search results.

Server fails with ENOENT. npx is not on PATH in the env the editor inherits. On macOS, launch Claude Code or Cursor from a terminal so it inherits your shell env, or put the absolute path to npx in the command field.

Subprocess keeps restarting. The MCP transport is strict about newlines on stdio. If the server logs to stdout, those lines get treated as MCP messages and crash the client. Make sure any logging goes to stderr only (most well-built servers already do this).

Alternatives

A few options if the Pinecone server does not fit your setup:

weaviate-mcp targets Weaviate which bundles embedding generation with vector search.
qdrant-mcp covers Qdrant, the open-source alternative you can self-host.
chroma-mcp works with Chroma for local-first or notebook-based workflows.

The MCP server pays off for RAG work and semantic search debugging where Claude can run a query, inspect what came back, and refine the prompt - all without switching to a notebook.

Performance notes and hardening

Steady-state call latency lands in the 150-500 ms range for most tools. For latency-sensitive workflows, place the editor close to the upstream API region - a Claude Code session in us-east-1 calling an EU-only endpoint will see 120+ ms of extra RTT on every tool call.

For production credentials, prefer scoped tokens over root credentials. Most services expose fine-grained permission models; use them. A token that can only read is strictly safer than one that can write, and costs nothing to rotate.

Log review is easier if you redirect MCP subprocess stderr to a file. Most editors do this by default, but not all surface the log path. On macOS, check ~/Library/Logs/Claude/ or the Cursor equivalent.

The Pinecone MCP server is the right default for any workflow that already touches Pinecone regularly. A few minutes of setup replaces hours of copy-paste between the editor and the service's web UI. Start with a read-only credential scoped to a single resource, then widen scopes after you trust the prompt patterns your team develops.

Pinecone MCP Server

Install