Voyage AI Review 2026: Best Embedding Models for RAG and Code Search
Voyage AI has established itself as the specialist embedding provider in 2026, with domain-specific models for code, finance, law, and multilingual content that consistently outperform general-purpose embeddings from OpenAI and Cohere on relevant benchmarks. This review covers whether the performance premium justifies the cost.
What is Voyage AI?
Voyage AI is a company focused exclusively on embedding models, they don't offer generative models. Their product line includes:
- voyage-3-large, Their flagship general-purpose embedding model (1024 dimensions)
- voyage-3, Balanced performance/cost general model (1024 dimensions)
- voyage-3-lite, Fastest, cheapest option (512 dimensions)
- voyage-code-3, Specialized for code retrieval
- voyage-finance-2, Specialized for financial documents
- voyage-law-2, Specialized for legal documents
- voyage-multilingual-2, 100+ language support
- voyage-multimodal-3, Text + image embeddings
Benchmarks vs Competitors
On MTEB (Massive Text Embedding Benchmark), as of April 2026:
General retrieval (BEIR benchmark):
| Model | NDCG@10 | Dimensions | Cost per 1M tokens |
| voyage-3-large | 57.1 | 1024 | $0.18 |
| text-embedding-3-large (OpenAI) | 54.9 | 3072 | $0.13 |
| embed-english-v3.0 (Cohere) | 55.0 | 1024 | $0.10 |
| BGE-M3 (open-source) | 54.2 | 1024 | Free (self-hosted) |
| voyage-3 | 55.2 | 1024 | $0.06 |
| voyage-3-lite | 51.8 | 512 | $0.02 |
Code retrieval (CodeSearchNet + BEIR code tasks):
| Model | NDCG@10 | Cost per 1M tokens |
| voyage-code-3 | 71.4 | $0.18 |
| text-embedding-3-large | 63.1 | $0.13 |
| embed-english-v3.0 | 61.8 | $0.10 |
| voyage-3-large | 64.2 | $0.18 |
Voyage-code-3's advantage on code retrieval is significant: +8 NDCG@10 over text-embedding-3-large. For code search, this is not a marginal improvement.
Financial document retrieval (FinanceBench):
| Model | Accuracy |
| voyage-finance-2 | 81.3% |
| text-embedding-3-large | 74.2% |
| voyage-3-large | 76.1% |
Again, the domain-specific model shows meaningful advantage.
Getting Started
pip install voyageai
import voyageai
import os
vo = voyageai.Client(api_key=os.environ["VOYAGE_API_KEY"])
# Single embedding
embedding = vo.embed(
["What is prompt caching in Anthropic's API?"],
model="voyage-3",
input_type="query" # "query" or "document"
).embeddings[0]
# Batch embeddings
texts = ["Document 1 content", "Document 2 content", "Document 3 content"]
result = vo.embed(
texts,
model="voyage-3",
input_type="document"
)
embeddings = result.embeddings # List of 1024-dim vectors
print(f"Total tokens used: {result.total_tokens}")
Important: Voyage distinguishes between query and document embeddings. Use input_type="query" for search queries and input_type="document" for indexed content. This matters, using the wrong type degrades retrieval quality.
RAG Pipeline with Voyage
import voyageai
import lancedb
import numpy as np
from anthropic import Anthropic
vo = voyageai.Client(api_key=os.environ["VOYAGE_API_KEY"])
client = Anthropic()
db = lancedb.connect("/data/rag-db")
def index_documents(texts: list[str], sources: list[str]):
"""Embed and store documents."""
result = vo.embed(texts, model="voyage-3", input_type="document")
table = db.open_table("documents")
table.add([
{"content": text, "vector": emb, "source": src}
for text, emb, src in zip(texts, result.embeddings, sources)
])
def retrieve(query: str, k: int = 5):
"""Retrieve relevant chunks."""
result = vo.embed([query], model="voyage-3", input_type="query")
query_vector = result.embeddings[0]
table = db.open_table("documents")
return table.search(query_vector).limit(k).to_pandas()
def answer(question: str) -> str:
"""Full RAG pipeline."""
chunks = retrieve(question)
context = "\n\n".join(
f"[Source: {row['source']}]\n{row['content']}"
for _, row in chunks.iterrows()
)
response = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1000,
messages=[{
"role": "user",
"content": f"Context:\n{context}\n\nQuestion: {question}\n\nAnswer based only on the context provided."
}]
)
return response.content[0].text
Code Search with voyage-code-3
# voyage-code-3 is optimized for code retrieval
# It understands function signatures, docstrings, comments
codebase_snippets = [
"def calculate_embedding(text: str) -> list[float]: ...",
"class VectorStore:\n def upsert(self, vectors): ...",
"async def rate_limited_request(client, prompt): ...",
]
# Index code with document type
code_embeddings = vo.embed(
codebase_snippets,
model="voyage-code-3",
input_type="document"
).embeddings
# Search code with query type
query_emb = vo.embed(
["function to handle rate limiting"],
model="voyage-code-3",
input_type="query"
).embeddings[0]
# voyage-code-3 understands the semantic relationship between
# "function to handle rate limiting" and "async def rate_limited_request"
Reranking
Voyage also offers a reranking model:
# After initial retrieval, rerank with voyage-rerank-2
result = vo.rerank(
query="How does prompt caching work?",
documents=[chunk.content for chunk in retrieved_chunks],
model="voyage-rerank-2",
top_k=3
)
for item in result.results:
print(f"Relevance: {item.relevance_score:.3f}")
print(retrieved_chunks[item.index].content[:200])
Pricing for voyage-rerank-2: $0.05 per 1K searches (1 search = 1 query + N documents)
Pricing Comparison
| Model | Price per 1M tokens | Dimensions | Best for |
| voyage-3-lite | $0.02 | 512 | High-volume, cost-sensitive |
| voyage-3 | $0.06 | 1024 | General RAG (best value) |
| voyage-3-large | $0.18 | 1024 | Max quality general retrieval |
| voyage-code-3 | $0.18 | 1024 | Code search |
| voyage-finance-2 | $0.12 | 1024 | Financial docs |
| voyage-law-2 | $0.12 | 1024 | Legal docs |
| voyage-multimodal-3 | $0.12 text / $0.18 img | 1024 | Mixed media |
For general RAG, voyage-3 at $0.06/1M is the sweet spot, it outperforms text-embedding-3-large on most benchmarks at roughly half the price.
Cost at Scale
For a 10M document corpus (averaging 512 tokens per chunk):
- voyage-3-lite: 5B tokens × $0.02/1M = $100
- voyage-3: 5B tokens × $0.06/1M = $300
- text-embedding-3-large: 5B tokens × $0.13/1M = $650
Voyage-3 costs 54% less than text-embedding-3-large while outperforming it on most benchmarks.
Dimension Reduction
Voyage supports Matryoshka-style dimension reduction (voyage-3-large natively supports it):
# Full 1024 dimensions
full_emb = vo.embed([text], model="voyage-3-large").embeddings[0] # 1024 dims
# Truncated to 256 dimensions, still useful for coarse retrieval
# Useful for ANN first-pass retrieval, then rerank with full dimensions
truncated = full_emb[:256]
When Voyage Beats OpenAI
Voyage wins:
- Code retrieval: voyage-code-3 has a large, consistent advantage
- Financial documents: voyage-finance-2 wins on FinanceBench
- Legal documents: voyage-law-2 wins on LegalBench
- Cost-efficiency for general RAG: voyage-3 beats text-embedding-3-large on both quality and price
- Multilingual: voyage-multilingual-2 is competitive with OpenAI's multilingual embeddings
OpenAI wins:
- Ecosystem integration: text-embedding-3 is native to OpenAI Assistants API, Batch API
- Zero additional vendor: if you're already fully on OpenAI, adding Voyage adds a dependency
- Very short texts: at sub-100 token texts, quality differences narrow
Limitations
- Vendor lock-in: Voyage is a single-product company. Switching costs are real if they raise prices or get acquired.
- Rate limits: Default limits are lower than OpenAI at equivalent pricing tiers. Plan for rate limit handling.
- Batch API: No async batch API equivalent to OpenAI's Batch API for 50% discounts on large offline jobs.
- SDKs: Python and JavaScript only. Go, Rust, Java users must use the HTTP API directly.
Summary
Voyage AI offers the best domain-specific embedding models available in 2026. For code search, voyage-code-3 is the clear choice. For general RAG, voyage-3 beats text-embedding-3-large on both quality and cost. The domain-specific models (finance, law) are genuinely useful for those verticals.
If you're building a code search tool, an enterprise knowledge base in a regulated industry, or a cost-conscious high-volume RAG system, Voyage deserves a serious evaluation.
Methodology
All benchmarks, pricing, and performance figures cited in this article are sourced from publicly available data: provider pricing pages (verified 2026-04-16), LMSYS Chatbot Arena ELO leaderboard, MTEB retrieval benchmark, and independent API tests. Costs are listed as per-million-token input/output unless noted. Rankings reflect the publication date and change as models update.