Do You Actually Need a Vector Database? (Vectorless RAG Guide)

Every week another startup announces they chose Pinecone, Weaviate, or Qdrant before they even had users. Meanwhile, their competitor is shipping faster with SQLite and a cosine similarity function.

The vector database gold rush has created an assumption: if you're building anything RAG-adjacent, you need a vector database. This assumption is wrong for a large percentage of use cases.

Let me show you when you don't need one — and what to use instead.

The Cost of Over-Engineering

Adding a vector database to your stack has real costs:

Operational complexity: another service to deploy, monitor, update, and debug
Data synchronization: keeping your primary database and vector database in sync is a source of bugs
Latency overhead: an additional network hop for every query
Financial cost: Pinecone's starter pod is $70/month; Weaviate Cloud starts at $25/month
Engineering time: learning a new API, data model, and query language

None of these costs are worth paying if simpler solutions work.

When Full-Text Search Beats Vector Search

Full-text search (BM25, TF-IDF) outperforms vector search for:

Exact keyword matches

If your users search for "error code E-4021" or "GDPR Article 17", BM25 will find it. Vector search might not — embeddings smooth over spelling and encode semantic meaning, which doesn't help when the user wants an exact string.

Short, specific queries

Product documentation, error messages, API references, changelog entries. These are short, precise, and keyword-indexed well. A user searching "how to reset password" in your docs will get better results from keyword search than semantic search.

High query volume + tight latency

BM25 over a 1M-document corpus is microseconds. Vector ANN search is milliseconds. If you're doing real-time autocomplete or high-volume search, full-text search is faster and cheaper.

Postgres Full-Text Search

-- Create a tsvector column for fast full-text search
ALTER TABLE documents ADD COLUMN content_tsv tsvector
  GENERATED ALWAYS AS (to_tsvector('english', content)) STORED;

CREATE INDEX ON documents USING gin(content_tsv);

-- Query with ranking
SELECT id, content,
  ts_rank(content_tsv, query) AS rank
FROM documents,
  to_tsquery('english', 'vector & database') query
WHERE content_tsv @@ query
ORDER BY rank DESC
LIMIT 10;

This handles 10M+ documents on a $100/month database instance with sub-10ms latency.

When SQLite + Cosine Similarity Is Enough

For small datasets (under 100K documents), you don't need a vector database. You need a numpy array and a cosine similarity function.

import numpy as np
import sqlite3
import json

# Store embeddings in SQLite as JSON blobs
def store_embedding(conn, doc_id, text, embedding):
    conn.execute(
        "INSERT INTO docs (id, text, embedding) VALUES (?, ?, ?)",
        (doc_id, text, json.dumps(embedding.tolist()))
    )

def search(conn, query_embedding, top_k=10):
    rows = conn.execute("SELECT id, text, embedding FROM docs").fetchall()
    
    ids = [r[0] for r in rows]
    texts = [r[1] for r in rows]
    embeddings = np.array([json.loads(r[2]) for r in rows])
    
    # Cosine similarity
    query_norm = query_embedding / np.linalg.norm(query_embedding)
    doc_norms = embeddings / np.linalg.norm(embeddings, axis=1, keepdims=True)
    scores = doc_norms @ query_norm
    
    top_indices = np.argsort(scores)[::-1][:top_k]
    return [(ids[i], texts[i], float(scores[i])) for i in top_indices]

This runs in <10ms for 10K documents, <100ms for 100K documents on a single CPU core. No additional infrastructure. Runs on a Raspberry Pi.

SQLite-Vec (the modern option)

The sqlite-vec extension (by Alex Garcia) adds proper vector search to SQLite:

import sqlite_vec
import sqlite3

db = sqlite3.connect("docs.db")
db.enable_load_extension(True)
sqlite_vec.load(db)

# Create virtual table
db.execute("""
  CREATE VIRTUAL TABLE embeddings USING vec0(
    id INTEGER PRIMARY KEY,
    embedding FLOAT[1536]
  )
""")

# Insert
db.execute("INSERT INTO embeddings VALUES (?, ?)", [doc_id, serialize_float32(embedding)])

# Query
rows = db.execute("""
  SELECT id, distance
  FROM embeddings
  WHERE embedding MATCH ?
  ORDER BY distance
  LIMIT 10
""", [serialize_float32(query_embedding)]).fetchall()

sqlite-vec runs as a single file, deploys anywhere, and handles up to ~500K vectors with acceptable performance.

When DuckDB Works

DuckDB has built-in array functions and handles columnar data efficiently. For analytics workloads or when you're processing large batches:

-- DuckDB: cosine similarity with list_cosine_similarity
SELECT id, content,
  list_cosine_similarity(embedding, $1::FLOAT[1536]) AS score
FROM documents
ORDER BY score DESC
LIMIT 10;

DuckDB scans are parallelized across cores. For 1M short documents, this can run in 200-500ms — not real-time, but fine for batch pipelines.

The Decision Framework

Answer these questions in order:

Step 1: How many documents do you have?

< 10K: use numpy + sqlite. Seriously. Done.
10K - 100K: sqlite-vec or pgvector on existing Postgres
100K - 5M: pgvector on a properly sized Postgres instance
> 5M: consider a dedicated vector database

Step 2: What do your users search for?

Exact terms, codes, names: start with BM25 (Postgres tsvector, Elasticsearch, Typesense)
Semantic meaning, paraphrases: vector search adds value
Both: hybrid search (BM25 + vector, ranked with RRF)

Step 3: What's your latency requirement?

< 50ms p99: need to be careful; large vector scans are slow
< 500ms: any approach works at reasonable scale
Batch/async: brute-force works at any scale

Step 4: Do you already have Postgres?

Yes: add pgvector. Zero new infrastructure.
No: evaluate whether vector search is actually your bottleneck before adding a new database

Step 5: Do you need SQL joins with vector results?

Yes: pgvector (vector search + SQL in one query)
No: any dedicated vector DB will be simpler to operate at scale

The Actual Decision Matrix

Scenario

When You Definitely Need a Vector Database

To be fair, there are real scenarios where a purpose-built vector DB earns its keep:

Multi-tenant at scale: Pinecone namespaces and Weaviate tenants make tenant isolation straightforward. pgvector at multi-tenant scale requires careful schema design.
Serverless / zero-ops: if you want to push vectors and query them with no infra management, Pinecone Serverless or Turbopuffer are legitimately good.
Hybrid dense + sparse search: Pinecone and Qdrant support hybrid search natively. pgvector requires combining with pg_bm25.
> 10M vectors with consistent low latency: at this scale, managing pgvector instances gets expensive and operationally complex.

What to Do Before Adding a Vector Database

Measure your actual recall: is your current search actually bad? Run evals (RAGAS or a simple golden dataset) before assuming vectors will help.
Try BM25 first: Postgres tsvector is one command. Add it and test it. Many RAG failures are chunking problems, not retrieval algorithm problems.
Profile your data: are your documents long or short? Many short docs favor keyword search. Long documents with semantically similar content favor vector search.
Check your query patterns: do users use natural language, or keywords? Survey them.

The Rule of Thumb

Add a vector database when:

You've proven vector search improves recall over BM25 for your specific data
Your dataset is large enough that pgvector becomes operationally painful
The operational complexity is worth the improvement

Until then, you're over-engineering. Ship faster with less infra, prove the need, then upgrade.

Personal project, < 10K docs	numpy + sqlite
Startup MVP, < 100K docs	sqlite-vec or pgvector
Production app, < 5M docs, on Postgres	pgvector
Production app, 5M+ docs, need scale	Pinecone / Qdrant / Weaviate
Need keyword + semantic search	pgvector + pg_bm25, or Elasticsearch
Need exact match + filters	BM25 first, add vectors if recall is bad
High-volume analytics, batch use	DuckDB