Embedding Model Selection for RAG (2026)
For most RAG pipelines in 2026, text-embedding-3-large (OpenAI) or voyage-3 (Voyage AI) are the highest-quality options. For cost-sensitive applications, text-embedding-3-small or nomic-embed-text (open-source) are strong alternatives. Always evaluate on your own data — MTEB scores don't always translate to domain-specific performance.
When to Use
- ✓Starting a new RAG system and choosing the embedding model before building the index
- ✓An existing RAG system has poor retrieval precision — reindexing with a better embedding model often fixes it
- ✓Processing domain-specific text (medical, legal, code) where general-purpose embeddings underperform
- ✓Optimizing embedding costs at scale — embedding costs add up when indexing millions of documents
- ✓Building multilingual RAG where you need embeddings that handle non-English content
How It Works
- 1Embedding models map text to dense vectors (typically 256–3072 dimensions). Semantically similar texts produce vectors with high cosine similarity. The model's training data and loss function determine which similarities it captures.
- 2Key dimensions to evaluate: (1) Retrieval quality on MTEB or your own eval set, (2) Max input tokens (important for long-chunk strategies), (3) Latency (local models vs. API), (4) Cost per token, (5) Multilingual support.
- 3For code embeddings, use code-specific models (voyage-code-2, CodeBERT) — general-purpose embeddings have poor recall for code because they don't understand syntax or identifier semantics.
- 4Matryoshka embeddings (text-embedding-3 models) support dimension reduction — you can use 256 dimensions instead of 1536 with only 5–10% quality loss and 6x storage reduction.
- 5Evaluate on a domain-specific query set: generate 50–100 test queries, retrieve top-5, manually rate precision@5. This takes 2–4 hours but saves weeks of troubleshooting bad retrieval.
Examples
import openai
import voyageai
# OpenAI text-embedding-3-large
client = openai.OpenAI()
response = client.embeddings.create(
model='text-embedding-3-large',
input=texts,
dimensions=1024 # Matryoshka: reduce from 3072
)
# Voyage AI voyage-3
vc = voyageai.Client()
result = vc.embed(texts, model='voyage-3', input_type='document')# Evaluate embedding model on domain-specific queries
from sklearn.metrics import average_precision_score
def evaluate_embedding_model(model_fn, queries, corpus, relevant_docs):
corpus_embeddings = model_fn(corpus)
query_embeddings = model_fn(queries)
scores = []
for q_emb, relevant in zip(query_embeddings, relevant_docs):
sims = cosine_similarity([q_emb], corpus_embeddings)[0]
labels = [1 if i in relevant else 0 for i in range(len(corpus))]
scores.append(average_precision_score(labels, sims))
return np.mean(scores) # Mean Average PrecisionCommon Mistakes
- ✗Choosing embedding model based solely on MTEB leaderboard rankings — MTEB is a general benchmark. Legal, medical, or code-heavy corpora often reverse the ranking. Evaluate on your data.
- ✗Using the same model for query embedding and document embedding when they have different distributions — some models (Voyage, Cohere) have separate input_type parameters for 'query' vs 'document' that significantly affect quality.
- ✗Ignoring max token limits when embedding long chunks — text-embedding-3 models max at 8191 tokens. Chunks exceeding this are silently truncated. Check that your chunks fit within the model's limit.
- ✗Not normalizing embeddings — most vector databases expect L2-normalized embeddings for cosine similarity. Some embedding APIs return unnormalized vectors. Always normalize before indexing.
FAQ
Which embedding model should I use for code search?+
voyage-code-2 (Voyage AI) is the leading model for code search as of 2026. It significantly outperforms text-embedding-3-large on code retrieval tasks. For open-source, nomic-embed-code is a strong free alternative. Avoid general-purpose embeddings for code — they confuse variable names and function signatures.
Is it worth using local embedding models?+
Yes for high-volume use cases. nomic-embed-text (v1.5, 768-dim) and mxbai-embed-large run locally via Ollama, have no API cost, and perform comparably to ada-002. For embedding millions of documents, local models save thousands of dollars. Latency is higher for small batches but lower for large batches (no network overhead).
What dimension size should I use?+
For most RAG applications, 768–1536 dimensions provides the best quality/cost/storage tradeoff. Matryoshka models (OpenAI text-embedding-3, Nomic v1.5) let you use 256 dimensions with only minor quality loss — useful for storing hundreds of millions of vectors. Below 256 dimensions, quality drops significantly.
How do multilingual embeddings compare?+
For multilingual RAG, Cohere's embed-multilingual-v3 and mxbai-embed-multilingual are the top performers as of 2026. They support 100+ languages with good cross-lingual retrieval (query in English, documents in French). General models like text-embedding-3-large also handle multilingual reasonably well but lag dedicated multilingual models by 5-15% on cross-lingual tasks.
Does it matter which embedding model I used to build the index?+
Critically — queries MUST be embedded with the same model used to embed the corpus. Mixing embedding models produces meaningless similarity scores. This means changing embedding models requires reindexing the entire corpus. Plan your embedding model choice carefully before building a large index.