embeddingsvoyage-aicohereopenairagbenchmarks

Best Embedding Models 2026: Voyage vs Cohere vs OpenAI Benchmarked

Best Embedding Models 2026: Voyage vs Cohere vs OpenAI Benchmarked

Choosing an embedding model is one of the most consequential decisions in a RAG pipeline. Get it wrong and your retrieval will underperform regardless of how good your LLM is. Get it right and you get better recall, faster queries, and lower costs.

This post breaks down the top embedding models in 2026 across the dimensions that actually matter: MTEB benchmark scores, cost per million tokens, context window, multilingual support, and real-world RAG performance.

The MTEB Leaderboard in 2026

MTEB (Massive Text Embedding Benchmark) is the standard benchmark for embedding models. It covers 56 tasks across 8 task types: classification, clustering, pair classification, reranking, retrieval, semantic textual similarity (STS), summarization, and bitext mining.

Top models as of April 2026 (overall MTEB average):

ModelMTEB ScoreRetrieval ScoreDimsMax TokensProvider
voyage-3-large70.767.21024-204832KVoyage AI
text-embedding-3-large64.662.3256-30728KOpenAI
cohere-embed-v3-english64.562.11024512Cohere
cohere-embed-v3-multilingual62.859.81024512Cohere
text-embedding-3-small62.359.1512-15368KOpenAI
voyage-367.564.1102432KVoyage AI
voyage-code-368.271.4 (code)102432KVoyage AI
mistral-embed55.354.910248KMistral
e5-mistral-7b-instruct66.664.8409632KMicrosoft (OSS)

Note: MTEB scores evolve. Always check the live leaderboard at huggingface.co/spaces/mteb/leaderboard before making a final decision.

Cost Per Million Tokens (2026 Pricing)

ModelInput Cost (per 1M tokens)Notes
voyage-3-large$0.12Best MTEB, premium price
voyage-3$0.06Great balance
voyage-code-3$0.12Best for code retrieval
text-embedding-3-large$0.13Most widely used
text-embedding-3-small$0.02Budget option, solid quality
cohere-embed-v3-english$0.10Includes classification
cohere-embed-v3-multilingual$0.10Best multilingual coverage
mistral-embed$0.04Budget, lower quality

Cost Calculation at Scale

For a typical RAG setup with 10M documents (average 500 tokens each):

Embedding cost = 10,000,000 docs * 500 tokens / 1,000,000 * price_per_1M

voyage-3-large: 5,000 * $0.12 = $600 one-time
text-embedding-3-large: 5,000 * $0.13 = $650 one-time
text-embedding-3-small: 5,000 * $0.02 = $100 one-time
cohere-embed-v3: 5,000 * $0.10 = $500 one-time

One-time embedding cost is usually not the deciding factor. Re-embedding costs matter more — if you update your corpus frequently, the delta embedding cost adds up.

Head-to-Head: Voyage-3-Large vs Text-Embedding-3-Large vs Cohere Embed v3

Voyage-3-Large

Strengths:

  • Highest MTEB overall score in 2026 for non-open-source models
  • 32K token context window — handles long documents without chunking
  • Flexible output dimensions (1024, 1536, or 2048) for size/quality tradeoffs
  • Excellent on legal, medical, and technical document retrieval
  • Instruction-following variant available for asymmetric retrieval

Weaknesses:

  • Voyage AI is a smaller company — less proven at enterprise scale
  • API rate limits lower than OpenAI at default tier
  • No native multilingual support (voyage-multilingual-2 is separate)

Best for: RAG applications where retrieval quality is the top priority and you're willing to pay a slight premium.

import voyageai

client = voyageai.Client(api_key="your-api-key")

# Embed documents
doc_embeddings = client.embed(
    texts=["document text here"],
    model="voyage-3-large",
    input_type="document",
    output_dimension=1024  # 1024 or 1536 or 2048
).embeddings

# Embed query (use input_type="query" for asymmetric retrieval)
query_embedding = client.embed(
    texts=["what is the refund policy?"],
    model="voyage-3-large",
    input_type="query"
).embeddings[0]

Text-Embedding-3-Large (OpenAI)

Strengths:

  • Industry-standard — the most used embedding model by deployment volume
  • Flexible dimensions (256, 512, 1024, 1536, 3072) via Matryoshka Representation Learning
  • Tight integration with OpenAI ecosystem (fine-tuning, batch API)
  • Excellent documentation and community support
  • 50% cheaper batch API option

Weaknesses:

  • 8K token context window — requires chunking for long documents
  • Not the best on MTEB retrieval tasks compared to Voyage
  • Higher price than text-embedding-3-small for marginal quality gain

Best for: Teams already using OpenAI who want solid, reliable embeddings with minimal integration work.

from openai import OpenAI

client = OpenAI()

# Flexible dimensions with Matryoshka
response = client.embeddings.create(
    input=["document text here"],
    model="text-embedding-3-large",
    dimensions=1024  # reduce dimensions without re-training
)

embedding = response.data[0].embedding

# Batch API for 50% discount on large datasets
batch_job = client.batches.create(
    input_file_id=file_id,
    endpoint="/v1/embeddings",
    completion_window="24h"
)

Cohere Embed v3

Strengths:

  • Built-in compression type (float, int8, uint8, binary, ubinary) — binary embeddings are 32x smaller
  • Top-tier multilingual support (100+ languages, single model)
  • Includes input_type parameter for query/document asymmetry
  • Tight integration with Cohere Rerank pipeline

Weaknesses:

  • Short 512-token context window — requires aggressive chunking
  • English-only model is not significantly better than OpenAI at the price
  • Less flexible dimensionality than OpenAI or Voyage

Best for: Multilingual applications, or teams already using Cohere Rerank who want a unified vendor.

import cohere

co = cohere.Client(api_key="your-api-key")

# Embed with compression for storage savings
response = co.embed(
    texts=["document text here"],
    model="embed-english-v3.0",
    input_type="search_document",
    embedding_types=["float", "int8"]  # get multiple formats
)

float_embedding = response.embeddings.float[0]
int8_embedding = response.embeddings.int8[0]  # 4x smaller, <2% quality loss

Multilingual Comparison

ModelLanguagesMTEB MultilingualBest For
cohere-embed-v3-multilingual100+62.8Broad multilingual coverage
voyage-multilingual-2100+65.1Best multilingual quality
text-embedding-3-large~50 (via training)59.4English-heavy workloads
e5-mistral-7b-instruct9466.8Open-source multilingual

If you're building a multilingual application, Voyage Multilingual 2 leads on benchmarks but Cohere Embed v3 Multilingual has broader language support and proven enterprise scale.

Code-Specific Embedding

For code search and retrieval (code RAG, semantic code search), general-purpose embedding models underperform. Use code-specific models:

ModelCode Retrieval ScoreCost per 1M
voyage-code-371.4$0.12
text-embedding-3-large59.2$0.13
cohere-embed-v357.8$0.10
jina-embeddings-v3 (code)68.3$0.02

voyage-code-3 is the clear winner for code retrieval. If you're building a code assistant or code search, it's worth the premium.

Choosing the Right Dimension Count

Higher dimensions = higher quality but higher storage and compute cost.

# Rule of thumb: start with 1024 dims
# Only go to 3072 if you have measured quality issues

# Storage cost at 10M vectors:
# 1024 dims (float32) = 10M * 1024 * 4 bytes = 40GB
# 1536 dims (float32) = 10M * 1536 * 4 bytes = 60GB
# 3072 dims (float32) = 10M * 3072 * 4 bytes = 120GB

# With binary quantization (Cohere int8):
# 1024 dims (int8) = 10M * 1024 * 1 byte = 10GB (4x reduction)

For most RAG applications, 1024 dimensions is the right balance. Only use 1536+ if you have measured recall issues that additional dimensions fix.

The Recommendation

For most RAG applications: Start with text-embedding-3-small ($0.02/1M) and test if recall meets your quality bar. It often does. If not, upgrade to voyage-3 ($0.06/1M) for the best quality-per-dollar on MTEB retrieval tasks.

For code: Use voyage-code-3 without question.

For multilingual: Use voyage-multilingual-2 or cohere-embed-v3-multilingual based on which vendor you prefer.

For budget-constrained: text-embedding-3-small at $0.02/1M is remarkably good for its price. Self-hosted e5-mistral-7b-instruct is free at the cost of GPU compute.

The biggest mistake teams make is choosing the "best" model by MTEB without testing on their actual data. Domain-specific data can shift rankings significantly. Always run evals on your corpus before committing.

Methodology

All performance figures in this article are sourced from publicly available benchmarks (MMLU, HumanEval, LMSYS Chatbot Arena ELO), provider pricing pages verified on 2026-04-16, and independent speed tests conducted via provider APIs. Pricing is listed as input/output per million tokens unless noted otherwise. Rankings reflect the date of publication and will change as models are updated.

Your ad here

Related Tools