Question 1

What chunk size should I use for enterprise document RAG?

Accepted Answer

For most enterprise documents, 512–768 tokens per chunk with 15–20% overlap performs best. Shorter chunks (256 tokens) improve precision for dense technical documentation and legal agreements where individual clauses are the unit of meaning. Longer chunks (1024–2048 tokens) work better for narrative documents like policy guides where context across paragraphs is critical. Always benchmark chunk sizes against your specific document corpus — a 10% change in chunk size can move retrieval accuracy by 5–15 percentage points.

Question 2

How do I prevent the AI from hallucinating information that isn't in the documents?

Accepted Answer

The key controls are: (1) strict grounding prompts that instruct the model to answer only from retrieved context and say 'I don't have information on this' when context is insufficient, (2) confidence scoring on retrieved chunks — refuse to answer when top-1 similarity score is below 0.75, (3) inline citations that users can click to verify, and (4) human-readable source attribution in every response. Models like claude-sonnet-4 are particularly good at refusing to speculate when properly instructed. Hallucination rates below 2% are achievable with these controls in place.

Question 3

How do I handle document access controls in a RAG system?

Accepted Answer

Implement access control at two layers: (1) ingestion filtering — tag each document chunk with permitted user groups from your identity provider (Okta, Azure AD), (2) query-time filtering — filter the vector search to only retrieve chunks the requesting user is authorized to see. Never retrieve first and then filter — this can expose document existence to unauthorized users. Tools like Weaviate and Pinecone support metadata-based access filters natively. Test your access control implementation with adversarial prompts (prompt injection attempts to retrieve forbidden documents).

Question 4

What's the difference between Glean, Guru, and building your own RAG?

Accepted Answer

Glean is a turnkey enterprise search SaaS (150+ connectors, SSO, access control built-in) that works in 2–4 weeks but costs $25–$50 per seat per month and has limited customization. Guru focuses on curated knowledge bases with human-verified cards, better for compliance-sensitive content but requires ongoing curation effort. Building your own RAG (LlamaIndex + Pinecone + Claude) takes 4–8 weeks of engineering but gives full control over chunking strategy, models, UI, and cost. Choose Glean/Guru for rapid deployment across a large organization; build your own when you have specific document types, access control complexity, or cost requirements that off-the-shelf solutions can't meet.

Question 5

How do I keep the search index fresh when documents change frequently?

Accepted Answer

Implement a change detection layer: poll source systems (SharePoint, Confluence, Google Drive) for document modification events via webhooks or scheduled diffs. On change, re-chunk and re-embed only the modified document, delete old vectors by document ID, and insert new vectors. For Notion and Confluence, their native webhooks fire within seconds of edits. For SharePoint, Microsoft Graph webhooks are reliable. Set a staleness warning if the last index timestamp for a retrieved document is older than 7 days — this covers 95% of policy and process documents that change on a weekly cadence.

Question 6

What's a realistic timeline to deploy enterprise document search?

Accepted Answer

Using a managed solution like Glean: 2–4 weeks (connector setup, SSO integration, user onboarding). Building on LlamaIndex/Pinecone: 6–10 weeks for a production-ready system (document ingestion pipeline, embedding, vector store, API, Slack bot, access control). Add 2–4 weeks for each major document source with non-standard formatting (legacy SharePoint, custom ERP exports). Budget $30,000–$80,000 in engineering time for a custom build covering 5–10 document sources with full access control.

AI for Document Search

The problem

Core workflows

Document Ingestion and Chunking

Semantic Question Answering with Citations

Contract and Policy Clause Lookup

Access-Controlled Search

Slack and Teams Search Integration

Document Freshness Monitoring

Top tools

Top models

FAQs

Related architectures