Reference Library

AI Reference Architectures

Production-ready stacks for the systems you are actually building. Each page includes the full architecture diagram, recommended stack, costs at 3 scales, latency budget, and real failure modes.

RAG / Search

Advanced RAG with Reranking: Two-Stage Retrieval for Production

Production RAG pipeline with two-stage retrieval: broad recall via hybrid dense+sparse search followed by precision reranking, plus contextual compression before the LLM call. Covers how reranking improves answer accuracy by 15-30% and reduces LLM context costs.

Updated Apr 16, 2026

Customer Knowledge Base Chatbot

Reference architecture for a high-volume help-center chatbot over 10k support articles. Zendesk-style, cheap per query, fast answers with deflection tracking, escalation to a human, and continuous learning from resolved tickets.

Updated Apr 16, 2026

Legal Document Search

Reference architecture for natural-language search across contract repositories and case law with strict citation, confidence scoring, and precision-over-recall ranking. 1M+ pages, pinpoint paragraph-level citations, and zero tolerance for hallucination.

Updated Apr 16, 2026

Log Analysis RAG

Reference architecture for natural-language queries over 1TB/day of observability logs. Combines log ingestion, pattern-aware chunking, vector search, structured metadata filters, anomaly detection, and LLM-driven root-cause analysis.

Updated Apr 16, 2026

Multimodal RAG (Text + Images + PDFs)

Reference architecture for layout-aware RAG over documents that are 30-70% images, diagrams, tables, and charts. Product catalogs, research papers, technical manuals. Combines VLM-based PDF parsing, image embeddings, and OCR into a unified retrieval pipeline.

Updated Apr 16, 2026

Real-time News RAG

Reference architecture for RAG over minute-fresh news, RSS, and social feeds. Streaming ingestion, freshness-weighted retrieval, deduplication, source credibility scoring, and answers that cite publish time down to the minute.

Updated Apr 16, 2026

Slack + Notion Internal Search

Reference architecture for unified, permissions-aware search across Slack, Notion, Linear, Google Drive, and GitHub. Incremental indexing via webhooks, per-user ACL enforcement, and natural-language answers that cite the exact thread, page, or commit.

Updated Apr 16, 2026

Enterprise Document Search

Reference architecture for semantic search across 1M+ enterprise documents (PDFs, Confluence, Notion, Google Docs). Hybrid retrieval, reranking, access control, and cost at scale.

Updated Apr 15, 2026

RAG for Codebase Search

Reference architecture for natural-language Q&A over a 1M+ line codebase. Code-aware embeddings, tree-sitter AST chunking, and cited answers in under 2 seconds. Used by tools like Cursor and Sourcegraph.

Updated Apr 15, 2026

Agents

Calendar Scheduling Agent

Reference architecture for an agent that parses availability, books meetings across timezones, and handles rescheduling via email or chat. Models, costs, and the timezone edge cases that break most implementations.

Updated Apr 16, 2026

Code Review Agent

Reference architecture for an LLM-powered pull request review agent that catches bugs, security issues, and style violations directly on GitHub. Models, stack, costs, failure modes.

Updated Apr 16, 2026

Data Analyst Agent

Reference architecture for a natural-language-to-SQL agent that queries tabular data, generates charts, and produces insights. Models, execution sandbox, costs, and why self-critique matters.

Updated Apr 16, 2026

Email Triage Agent

Reference architecture for an LLM agent that sorts your inbox, drafts replies, and flags priority mail. Models, costs, latency budget, failure modes.

Updated Apr 16, 2026

LLM Function Calling & Tool Use: Production Architecture

Production patterns for LLM tool use: schema design, parallel tool calls, error handling when tools fail, result injection, and preventing infinite tool-call loops. Covers when to use native function calling vs manual JSON parsing.

Updated Apr 16, 2026

Meeting Notetaker Agent

Reference architecture for an agent that transcribes meetings, extracts action items, and produces structured summaries. Models, costs, diarization strategy, and production failure modes.

Updated Apr 16, 2026

Multi-Agent Orchestration: Supervisor-Worker Architecture

A production pattern for coordinating multiple specialized LLM agents under a supervisor that delegates tasks, manages context, and aggregates results. Covers loop prevention, parallel workers, and inter-agent communication.

Updated Apr 16, 2026

QA Testing Agent

Reference architecture for an agent that generates test cases from code and requirements, runs them, and diagnoses failures. Models, execution sandbox, costs, and why flaky-test detection is non-negotiable.

Updated Apr 16, 2026

Research Agent

Reference architecture for a multi-step research agent that searches the web, synthesizes sources, and produces citation-backed answers. Models, search tools, costs, and why most research agents hallucinate.

Updated Apr 16, 2026

Sales Outreach Agent

Reference architecture for an agent that researches leads, writes personalized first messages, and handles replies. Models, tools, costs, and why most LangGraph templates fail in production.

Updated Apr 16, 2026

Customer Support Agent

Reference architecture for an LLM-powered customer support agent handling 10k+ conversations/day. Models, stack, costs, failure modes, and production guardrails.

Updated Apr 15, 2026

Generation

Function-Level Code Generation

Reference architecture for generating production-quality functions from a signature and spec: test-first prompting, sandboxed execution, iterative repair from test output, and static analysis gates before merge.

Updated Apr 16, 2026

Email & Message Composition

Reference architecture for drafting emails, Slack messages, and outbound notifications from context: CRM data, thread history, recipient signals. Tone calibration, multi-variant A/B generation, personalization, and deliverability guardrails.

Updated Apr 16, 2026

End-to-End Fine-Tuning Pipeline: From Data to Deployment

A complete fine-tuning pipeline covering data collection, cleaning, formatting, LoRA training, evaluation, and deployment. Includes decision framework for when fine-tuning beats prompting, with realistic cost and performance benchmarks.

Updated Apr 16, 2026

Invoice Structured Extraction

Reference architecture for turning PDF and image invoices into validated JSON: vendor, line items, tax, totals, and payment terms. OCR fallback, schema-constrained LLM extraction, and deterministic math validation.

Updated Apr 16, 2026

Long-Form Content Generation

Reference architecture for generating 3,000 to 10,000 word blog posts, research reports, and documentation. Outline-first planning, section-by-section drafting, style polish, and aggressive prompt caching to keep costs predictable.

Updated Apr 16, 2026

Prompt Caching & Cost Optimization: 90% Savings on Repetitive Prompts

Architecture for Anthropic and OpenAI prompt caching: cache design patterns, minimum token thresholds, hit rate measurement, and realistic cost savings on repetitive system prompts. Covers when caching pays off vs when it doesn't.

Updated Apr 16, 2026

Token Streaming Pipeline: LLM to UI at Scale

Production architecture for streaming LLM tokens to web and mobile clients using SSE and WebSocket. Covers back-pressure, partial JSON parsing, error recovery mid-stream, and infrastructure at scale with Vercel Edge and Cloudflare.

Updated Apr 16, 2026

Test Case Generation

Reference architecture for generating unit and integration tests from existing code plus requirements. Coverage-guided selection, mutation testing feedback, and deduplication to prevent test-suite bloat.

Updated Apr 16, 2026

Translation Pipeline at Scale

Reference architecture for translating product strings, help docs, and marketing copy into 20+ languages with glossary preservation, tone control, and both batch and realtime variants. Memory-backed consistency, QA loop, and per-locale human review.

Updated Apr 16, 2026

Text-to-SQL Agent

Reference architecture for translating natural-language questions into safe, correct SQL. Schema-aware prompting, dry-run validation, read-only execution, and guardrails against destructive queries.

Updated Apr 15, 2026

Classification

Realtime Content Moderation Pipeline

Reference architecture for moderating user-generated text and images in realtime. Tiered policy classifier, human review queue, appeals workflow, and audit logging at billions of items per day.

Updated Apr 16, 2026

Contract Clause Extraction Pipeline

Reference architecture for turning legal contracts (MSAs, NDAs, SOWs, leases) into a structured clause database with citations. High precision, layout-aware parsing, clause-by-clause evidence, and human-in-the-loop review.

Updated Apr 16, 2026

Intent Classification for Message Routing

Reference architecture for multi-label intent classification routing inbound customer messages to the right team - support, sales, billing, abuse. Confidence thresholds, fallback rules, and drift monitoring.

Updated Apr 16, 2026

Automated LLM Evaluation Harness: CI/CD for AI Quality

A production evaluation system for LLMs covering test dataset management, LLM-as-judge scoring, regression testing, A/B model comparison, and CI integration. Prevents prompt drift and model regressions from silently breaking production.

Updated Apr 16, 2026

Resume Screening Pipeline

Reference architecture for LLM-assisted resume screening. Parses PDFs, matches against a job description, extracts skills, ranks candidates - with bias mitigation, full audit log, and EEOC-compliant deterministic scoring.

Updated Apr 16, 2026

Sentiment Analysis at Scale

Reference architecture for classifying sentiment across billions of reviews, social posts, and support messages per day. Small fast classifier for the head, LLM deep-dive on the ambiguous tail.

Updated Apr 16, 2026

Multimodal

Image-Based Search (Visual Similarity + Text Query)

Reference architecture for product/media catalog search using image similarity, text-to-image queries, and hybrid filters. CLIP-family embeddings, vector index, and LLM reranking for nuanced queries.

Updated Apr 16, 2026

OCR + Document Understanding Pipeline

Reference architecture for turning scanned documents, invoices, receipts, forms, and handwritten notes into structured, queryable data. Layout-aware OCR, multi-language, handwriting, and LLM-based field understanding.

Updated Apr 16, 2026

Video Summarization Pipeline

Reference architecture for turning YouTube videos, meetings, and webinars into chapters, transcripts, and key moments. Audio transcription, visual frame analysis, speaker diarization, and cross-modal summary generation.

Updated Apr 16, 2026

Voice

Voice Customer Service Agent

Reference architecture for real-time voice agents on customer calls. Sub-1.2s end-to-end latency, barge-in support, telephony integration, and cost per minute at scale.

Updated Apr 15, 2026