Best Embedding Models 2026: Voyage vs Cohere vs OpenAI Benchmarked
Choosing an embedding model is one of the most consequential decisions in a RAG pipeline. Get it wrong and your retrieval will underperform regardless of how good your LLM is. Get it right and you get better recall, faster queries, and lower costs.
This post breaks down the top embedding models in 2026 across the dimensions that actually matter: MTEB benchmark scores, cost per million tokens, context window, multilingual support, and real-world RAG performance.
The MTEB Leaderboard in 2026
MTEB (Massive Text Embedding Benchmark) is the standard benchmark for embedding models. It covers 56 tasks across 8 task types: classification, clustering, pair classification, reranking, retrieval, semantic textual similarity (STS), summarization, and bitext mining.
Top models as of April 2026 (overall MTEB average):
| Model | MTEB Score | Retrieval Score | Dims | Max Tokens | Provider |
| voyage-3-large | 70.7 | 67.2 | 1024-2048 | 32K | Voyage AI |
| text-embedding-3-large | 64.6 | 62.3 | 256-3072 | 8K | OpenAI |
| cohere-embed-v3-english | 64.5 | 62.1 | 1024 | 512 | Cohere |
| cohere-embed-v3-multilingual | 62.8 | 59.8 | 1024 | 512 | Cohere |
| text-embedding-3-small | 62.3 | 59.1 | 512-1536 | 8K | OpenAI |
| voyage-3 | 67.5 | 64.1 | 1024 | 32K | Voyage AI |
| voyage-code-3 | 68.2 | 71.4 (code) | 1024 | 32K | Voyage AI |
| mistral-embed | 55.3 | 54.9 | 1024 | 8K | Mistral |
| e5-mistral-7b-instruct | 66.6 | 64.8 | 4096 | 32K | Microsoft (OSS) |
Note: MTEB scores evolve. Always check the live leaderboard at huggingface.co/spaces/mteb/leaderboard before making a final decision.
Cost Per Million Tokens (2026 Pricing)
| Model | Input Cost (per 1M tokens) | Notes |
| voyage-3-large | $0.12 | Best MTEB, premium price |
| voyage-3 | $0.06 | Great balance |
| voyage-code-3 | $0.12 | Best for code retrieval |
| text-embedding-3-large | $0.13 | Most widely used |
| text-embedding-3-small | $0.02 | Budget option, solid quality |
| cohere-embed-v3-english | $0.10 | Includes classification |
| cohere-embed-v3-multilingual | $0.10 | Best multilingual coverage |
| mistral-embed | $0.04 | Budget, lower quality |
Cost Calculation at Scale
For a typical RAG setup with 10M documents (average 500 tokens each):
Embedding cost = 10,000,000 docs * 500 tokens / 1,000,000 * price_per_1M
voyage-3-large: 5,000 * $0.12 = $600 one-time
text-embedding-3-large: 5,000 * $0.13 = $650 one-time
text-embedding-3-small: 5,000 * $0.02 = $100 one-time
cohere-embed-v3: 5,000 * $0.10 = $500 one-time
One-time embedding cost is usually not the deciding factor. Re-embedding costs matter more — if you update your corpus frequently, the delta embedding cost adds up.
Head-to-Head: Voyage-3-Large vs Text-Embedding-3-Large vs Cohere Embed v3
Voyage-3-Large
Strengths:
- Highest MTEB overall score in 2026 for non-open-source models
- 32K token context window — handles long documents without chunking
- Flexible output dimensions (1024, 1536, or 2048) for size/quality tradeoffs
- Excellent on legal, medical, and technical document retrieval
- Instruction-following variant available for asymmetric retrieval
Weaknesses:
- Voyage AI is a smaller company — less proven at enterprise scale
- API rate limits lower than OpenAI at default tier
- No native multilingual support (voyage-multilingual-2 is separate)
Best for: RAG applications where retrieval quality is the top priority and you're willing to pay a slight premium.
import voyageai
client = voyageai.Client(api_key="your-api-key")
# Embed documents
doc_embeddings = client.embed(
texts=["document text here"],
model="voyage-3-large",
input_type="document",
output_dimension=1024 # 1024 or 1536 or 2048
).embeddings
# Embed query (use input_type="query" for asymmetric retrieval)
query_embedding = client.embed(
texts=["what is the refund policy?"],
model="voyage-3-large",
input_type="query"
).embeddings[0]
Text-Embedding-3-Large (OpenAI)
Strengths:
- Industry-standard — the most used embedding model by deployment volume
- Flexible dimensions (256, 512, 1024, 1536, 3072) via Matryoshka Representation Learning
- Tight integration with OpenAI ecosystem (fine-tuning, batch API)
- Excellent documentation and community support
- 50% cheaper batch API option
Weaknesses:
- 8K token context window — requires chunking for long documents
- Not the best on MTEB retrieval tasks compared to Voyage
- Higher price than text-embedding-3-small for marginal quality gain
Best for: Teams already using OpenAI who want solid, reliable embeddings with minimal integration work.
from openai import OpenAI
client = OpenAI()
# Flexible dimensions with Matryoshka
response = client.embeddings.create(
input=["document text here"],
model="text-embedding-3-large",
dimensions=1024 # reduce dimensions without re-training
)
embedding = response.data[0].embedding
# Batch API for 50% discount on large datasets
batch_job = client.batches.create(
input_file_id=file_id,
endpoint="/v1/embeddings",
completion_window="24h"
)
Cohere Embed v3
Strengths:
- Built-in compression type (
float,int8,uint8,binary,ubinary) — binary embeddings are 32x smaller - Top-tier multilingual support (100+ languages, single model)
- Includes
input_typeparameter for query/document asymmetry - Tight integration with Cohere Rerank pipeline
Weaknesses:
- Short 512-token context window — requires aggressive chunking
- English-only model is not significantly better than OpenAI at the price
- Less flexible dimensionality than OpenAI or Voyage
Best for: Multilingual applications, or teams already using Cohere Rerank who want a unified vendor.
import cohere
co = cohere.Client(api_key="your-api-key")
# Embed with compression for storage savings
response = co.embed(
texts=["document text here"],
model="embed-english-v3.0",
input_type="search_document",
embedding_types=["float", "int8"] # get multiple formats
)
float_embedding = response.embeddings.float[0]
int8_embedding = response.embeddings.int8[0] # 4x smaller, <2% quality loss
Multilingual Comparison
| Model | Languages | MTEB Multilingual | Best For |
| cohere-embed-v3-multilingual | 100+ | 62.8 | Broad multilingual coverage |
| voyage-multilingual-2 | 100+ | 65.1 | Best multilingual quality |
| text-embedding-3-large | ~50 (via training) | 59.4 | English-heavy workloads |
| e5-mistral-7b-instruct | 94 | 66.8 | Open-source multilingual |
If you're building a multilingual application, Voyage Multilingual 2 leads on benchmarks but Cohere Embed v3 Multilingual has broader language support and proven enterprise scale.
Code-Specific Embedding
For code search and retrieval (code RAG, semantic code search), general-purpose embedding models underperform. Use code-specific models:
| Model | Code Retrieval Score | Cost per 1M |
| voyage-code-3 | 71.4 | $0.12 |
| text-embedding-3-large | 59.2 | $0.13 |
| cohere-embed-v3 | 57.8 | $0.10 |
| jina-embeddings-v3 (code) | 68.3 | $0.02 |
voyage-code-3 is the clear winner for code retrieval. If you're building a code assistant or code search, it's worth the premium.
Choosing the Right Dimension Count
Higher dimensions = higher quality but higher storage and compute cost.
# Rule of thumb: start with 1024 dims
# Only go to 3072 if you have measured quality issues
# Storage cost at 10M vectors:
# 1024 dims (float32) = 10M * 1024 * 4 bytes = 40GB
# 1536 dims (float32) = 10M * 1536 * 4 bytes = 60GB
# 3072 dims (float32) = 10M * 3072 * 4 bytes = 120GB
# With binary quantization (Cohere int8):
# 1024 dims (int8) = 10M * 1024 * 1 byte = 10GB (4x reduction)
For most RAG applications, 1024 dimensions is the right balance. Only use 1536+ if you have measured recall issues that additional dimensions fix.
The Recommendation
For most RAG applications: Start with text-embedding-3-small ($0.02/1M) and test if recall meets your quality bar. It often does. If not, upgrade to voyage-3 ($0.06/1M) for the best quality-per-dollar on MTEB retrieval tasks.
For code: Use voyage-code-3 without question.
For multilingual: Use voyage-multilingual-2 or cohere-embed-v3-multilingual based on which vendor you prefer.
For budget-constrained: text-embedding-3-small at $0.02/1M is remarkably good for its price. Self-hosted e5-mistral-7b-instruct is free at the cost of GPU compute.
The biggest mistake teams make is choosing the "best" model by MTEB without testing on their actual data. Domain-specific data can shift rankings significantly. Always run evals on your corpus before committing.
Methodology
All performance figures in this article are sourced from publicly available benchmarks (MMLU, HumanEval, LMSYS Chatbot Arena ELO), provider pricing pages verified on 2026-04-16, and independent speed tests conducted via provider APIs. Pricing is listed as input/output per million tokens unless noted otherwise. Rankings reflect the date of publication and will change as models are updated.