ragadvanced

Hybrid Search: Combining Dense and Sparse Retrieval (2026)

Quick Answer

Hybrid search runs both BM25 (keyword matching) and dense vector search in parallel, then merges the results using reciprocal rank fusion (RRF) or a learned score combiner. It consistently outperforms pure vector search by 5–15% on BEIR benchmarks because BM25 catches exact keyword matches that dense models miss (product codes, names, technical terms), while dense search catches semantic paraphrases that BM25 misses.

When to Use

  • Queries frequently include specific product codes, model numbers, or technical jargon that vector search misses
  • Your document corpus has structured identifiers (invoice numbers, part numbers, API names) that need exact matching
  • Enterprise search where user queries mix natural language with specific terms
  • After setting up basic vector RAG and precision is still suboptimal (below 0.75)
  • Legal or compliance search where missing an exact-match clause is unacceptable

How It Works

  1. 1Dense search: embed the query and retrieve the top-K chunks by cosine similarity. Good at semantic matching, paraphrase, and conceptual similarity. Poor at exact keyword matching.
  2. 2Sparse search (BM25): tokenize the query and score documents by term frequency and inverse document frequency. Excellent at exact keyword matching. Poor at semantic similarity or paraphrase.
  3. 3Reciprocal Rank Fusion (RRF): combine the ranked lists from both searches using RRF formula: score(d) = Σ 1/(k + rank(d)). k=60 is standard. RRF doesn't require calibrated scores — just ranks — making it simple and robust.
  4. 4Learned fusion: train a small ranker that weights dense and sparse scores based on query type. More powerful than RRF but requires labeled training data. Use RRF by default, learned fusion for high-traffic production systems.
  5. 5Most vector databases now support hybrid search natively: Elasticsearch (dense+BM25), Qdrant (dense+sparse vectors), Weaviate (BM25+dense), Pinecone (dense+sparse with SPLADE). No need to maintain separate sparse and dense indexes.

Examples

Hybrid search with Qdrant
from qdrant_client import QdrantClient, models
from fastembed import SparseTextEmbedding

client = QdrantClient(url=QDRANT_URL)
sparse_model = SparseTextEmbedding(model_name='prithivida/Splade_PP_en_v1')

def hybrid_search(query: str, top_k: int = 10):
    dense_vector = embed_model.embed(query)  # your dense embedder
    sparse_vector = list(sparse_model.embed([query]))[0]
    
    results = client.query_points(
        collection_name='documents',
        prefetch=[
            models.Prefetch(query=dense_vector, using='dense', limit=50),
            models.Prefetch(query=sparse_vector, using='sparse', limit=50),
        ],
        query=models.FusionQuery(fusion=models.Fusion.RRF),
        limit=top_k
    )
    return results
Output:Qdrant native hybrid search using RRF fusion. Prefetch gets top-50 from each index, RRF merges to top-10. SPLADE sparse vectors outperform classic BM25 for hybrid search because they expand query terms.
Elasticsearch hybrid with RRF
# Elasticsearch 8.8+ native hybrid search with RRF
query = {
    'retriever': {
        'rrf': {
            'retrievers': [
                {
                    'standard': {
                        'query': {
                            'match': {'content': user_query}
                        }
                    }
                },
                {
                    'knn': {
                        'field': 'content_vector',
                        'query_vector': query_embedding,
                        'num_candidates': 50
                    }
                }
            ],
            'rank_window_size': 100,
            'rank_constant': 60
        }
    },
    'size': 10
}
Output:Elasticsearch 8.8+ RRF retriever natively combines BM25 and kNN. rank_constant=60 is the standard k in RRF formula. rank_window_size=100 means consider top-100 from each retriever before fusion.

Common Mistakes

  • Using simple score averaging instead of RRF — averaging dense and sparse scores requires calibrated scores on the same scale. Dense cosine similarity and BM25 scores have completely different ranges. Always use RRF or a trained combiner.
  • Not tuning the K value in RRF — k=60 is a robust default but not universal. For corpora with many highly relevant documents, k=10–20 gives higher weight to top-ranked results. A/B test on your eval set.
  • Forgetting to build a sparse index — dense-only vector DBs (early Pinecone, Chroma) don't support sparse search. If your DB doesn't support sparse natively, run BM25 in a separate Elasticsearch or Opensearch instance.
  • Applying hybrid search to all query types — simple factual lookups ('What is the capital of France?') don't benefit from hybrid search. Apply it selectively to queries with specific terms or where pure vector search has poor recall.

FAQ

How much does hybrid search actually improve results?+

On BEIR (the standard retrieval benchmark), hybrid search improves nDCG@10 by 3–10 points over pure dense or sparse search alone. In production, the improvement is most pronounced for queries with specific technical terms (product names, error codes, proper nouns) where it can be 20-30% better.

What is SPLADE and should I use it instead of BM25?+

SPLADE is a learned sparse retrieval model that uses BERT to expand query terms. It outperforms BM25 by 5-10% on most benchmarks while maintaining the same architecture benefits (no vector dimension mismatch). If your vector DB supports sparse vectors (Qdrant, Pinecone), use SPLADE instead of BM25. If you're using Elasticsearch, BM25 is the simpler choice.

Does hybrid search help with multilingual RAG?+

Yes, especially for cross-lingual queries. Dense multilingual embeddings handle paraphrase across languages well, but BM25/SPLADE helps for queries that contain language-specific terms, names, or borrowed words that aren't well-represented in the embedding space.

Can I add hybrid search to an existing RAG system without reindexing?+

If your vector DB supports hybrid search, you may need to add a sparse index alongside your existing dense index, which often requires re-ingesting documents to build sparse vectors. Elasticsearch allows adding hybrid search to existing dense-only indices by adding a separate BM25 field. Plan for a reindexing step when adopting hybrid search.

Is hybrid search worth the added complexity?+

For most production RAG systems handling user queries with diverse vocabulary, yes. The 5-15% quality improvement is meaningful. The added complexity is modest — native hybrid search in Qdrant and Elasticsearch requires only a small code change. The main cost is sparse vector storage (adds 20-30% to index size).

Related