Metadata Filtering in RAG Systems (2026)
Metadata filtering means attaching structured fields (date, source, category, user_id) to each chunk at index time, then including filter conditions in your vector search queries. A query like 'documents from 2026 in the legal category' runs semantic search only over matching chunks, improving precision and cutting retrieval cost. It's essential for multi-tenant systems where users should only see their own data.
When to Use
- ✓Multi-tenant RAG where users must only retrieve their own documents — metadata filter on user_id or org_id
- ✓Time-sensitive queries where only recent documents are relevant (filter by date > 6 months ago)
- ✓Domain-specific queries where cross-domain contamination reduces precision (filter by category or document_type)
- ✓Compliance requirements where certain users can't access certain document classifications
- ✓Large corpora (millions of chunks) where semantic search alone is too slow or costly — metadata pre-filters reduce the search space by 90%+
How It Works
- 1At index time, extract metadata for each document (source, date, category, author, language, etc.) and store it alongside the embedding vector. All major vector DBs (Pinecone, Weaviate, Qdrant, pgvector) support metadata storage.
- 2Design metadata schema upfront — it's expensive to reindex. Decide which filters you'll need and store those fields on every chunk. Include: document_id, chunk_index, source_url, created_at, category, language, and any domain-specific fields.
- 3At query time, construct filter conditions using the vector DB's filter syntax. Filters run before vector search, eliminating non-matching documents from the ANN search entirely (in most vector DBs). Pinecone, Weaviate, and Qdrant all support this.
- 4For dynamic filtering based on query content, use an LLM to extract filter parameters from the query: 'Find policies updated after January 2026 about data retention' → {category: 'policy', date_gte: '2026-01-01', topic_contains: 'data retention'}.
- 5Combine metadata filtering with hybrid search: filter reduces the candidate set, then BM25 + dense search rank within the filtered set. This is the production-grade pattern for most enterprise RAG.
Examples
from qdrant_client import QdrantClient
from qdrant_client.models import Filter, FieldCondition, MatchValue
client = QdrantClient(url=QDRANT_URL)
def retrieve_for_user(query_embedding, user_id: str, top_k: int = 10):
results = client.search(
collection_name='documents',
query_vector=query_embedding,
query_filter=Filter(
must=[
FieldCondition(
key='user_id',
match=MatchValue(value=user_id)
)
]
),
limit=top_k
)
return [r.payload['text'] for r in results]# Extract structured filters from natural language query
from anthropic import Anthropic
import json
client = Anthropic()
def extract_filters(query: str) -> dict:
response = client.messages.create(
model='claude-3-5-haiku-20241022',
max_tokens=200,
messages=[{
'role': 'user',
'content': f'''Extract search filters from this query as JSON.
Available filter fields: category (string), date_after (ISO date), language (string), author (string).
Query: {query}
Return only valid JSON, no explanation.'''
}]
)
return json.loads(response.content[0].text)
# 'Find legal documents about GDPR from 2025 in English'
# Returns: {"category": "legal", "date_after": "2025-01-01", "language": "en"}Common Mistakes
- ✗Storing metadata as strings when comparisons need structured types — storing dates as '2026-04-16' strings prevents date range filtering. Store dates as timestamps, numbers as integers, categories as keyword fields.
- ✗Over-filtering so that no documents match — when an LLM extracts filters and applies all of them strictly, edge-case queries return zero results. Implement fallback: retry with progressively relaxed filters if the filtered search returns fewer than 3 results.
- ✗Inconsistent metadata at index time — if some chunks have category='legal' and others have category='Legal' or category='law', filtering by 'legal' misses half the corpus. Normalize all metadata values at index time.
- ✗Not indexing metadata fields in the vector DB — some vector DBs require you to declare filterable fields at collection creation time (Pinecone, Qdrant). If you don't declare them upfront, filters may be slow or unsupported.
FAQ
Which vector databases support metadata filtering best?+
Qdrant has the most expressive filter syntax (nested conditions, geo-filters, full-text on payload). Weaviate has strong filtering with its GraphQL API. Pinecone supports metadata filtering but limits metadata size. pgvector with PostgreSQL has the most powerful filtering (full SQL WHERE clauses) but is slowest for pure ANN search. For complex filtering needs, Qdrant or Weaviate are the top choices.
Does metadata filtering slow down search?+
When metadata fields are indexed (not just stored), filtering is fast — it typically adds under 5ms. Unindexed metadata filtering can be very slow (full scan). Always declare filter fields as indexed in your vector DB configuration.
How do I handle hierarchical metadata (document → section → chunk)?+
Store the full hierarchy on each chunk: document_id, section_id, chunk_id. This lets you filter at any level: 'all chunks from document X', 'all chunks from section Y of document X', or 'this specific chunk'. The document_id and section_id on each chunk are the critical fields.
Can metadata filtering replace access control?+
No. Metadata filtering enforces access at the query level but doesn't prevent a developer from bypassing filters. For true access control, enforce user_id/org_id constraints at the application layer (not just the DB query), log all queries, and run regular audits. Metadata filtering is a necessary but not sufficient component of multi-tenant security.
How much metadata overhead does this add to storage?+
Metadata is tiny compared to embeddings. A 1536-dimension float32 embedding takes 6KB. A metadata payload with 5 fields (user_id, date, category, source, language) takes under 200 bytes. Metadata overhead is negligible — index all fields you might ever filter on.