LLM Skills Library

Context stuffing fills the LLM context window with relevant documents, code, or data so the model can reason over it directly — without retrieval. Learn when to stuff context vs. build RAG, and how to structure large contexts for maximum accuracy.

Few-Shot Prompting

Few-shot prompting dramatically improves LLM consistency by showing 2–8 examples of the desired input-output pattern before the actual query. Learn example selection, ordering, and formatting strategies.

Meta-Prompting

Meta-prompting uses an LLM to generate, critique, and refine prompts for a target LLM. It automates prompt engineering by having the model act as a prompt optimizer — dramatically reducing manual iteration time.

Prompt Chaining

Prompt chaining breaks complex tasks into sequential LLM calls where each output feeds the next. Learn when to chain, how to design handoffs, and how to handle errors mid-chain.

Prompt Compression

Prompt compression reduces input token count by 50–90% through selective information removal, LLMLingua-style token pruning, and semantic summarization. Learn when compression helps, when it hurts, and how to measure the tradeoff.

ReAct Pattern

ReAct (Reasoning + Acting) interleaves LLM reasoning traces with tool actions, enabling agents to decompose tasks, call external APIs, and update their plan based on observations. It's the foundation of most production LLM agents.

Role Prompting

Role prompting assigns a persona or expert identity to an LLM to improve output quality and domain alignment. Learn which roles work, why they help, and the limits of persona assignment.

Self-Consistency

Self-consistency runs the same chain-of-thought prompt multiple times with temperature > 0 and takes a majority vote on the final answers. It reliably improves accuracy on reasoning and math tasks at the cost of multiple inference calls.

Structured Output

Getting reliable JSON, CSV, or schema-compliant output from LLMs. Learn constrained decoding, schema prompting, validation loops, and which APIs guarantee valid JSON.

System Prompt Design

System prompts are the foundation of every production LLM application. Learn how to write system prompts that consistently control persona, format, safety constraints, and output quality.

Temperature & Sampling

Temperature, top-p, top-k, and frequency penalties control how an LLM samples its output. Learn exactly what each parameter does, when to turn temperature to zero, and how to tune sampling for creative vs. deterministic tasks.

Tree of Thought

Tree of Thought (ToT) enables LLMs to explore multiple reasoning branches in parallel, evaluate intermediate steps, and backtrack — mimicking deliberate human problem-solving for hard tasks.

XML Tags for Claude

Claude is trained to follow XML-tagged prompt structure exceptionally well. Learn how to use XML tags to separate instructions from content, pass multi-part inputs, and improve Claude's output consistency and accuracy.

Zero-Shot Prompting

Zero-shot prompting lets you use LLMs without providing any examples. Learn when it works, when it fails, and how to write zero-shot prompts that get reliable results.

RAG & Retrieval

Build accurate retrieval systems that don't hallucinate.

Chunking Strategies

Chunking splits documents into pieces for embedding and retrieval. The right chunking strategy — fixed-size, semantic, hierarchical, or late chunking — directly determines RAG accuracy. Learn the tradeoffs for each approach.

Contextual Retrieval

Contextual retrieval (Anthropic, 2024) prepends a short context summary to each chunk before embedding, giving the embedding model information about where the chunk sits in the document. It reduces retrieval failures by 49% on Anthropic's benchmarks.

Embedding Selection

Choosing the right embedding model is the single biggest lever for RAG retrieval quality. This guide covers the major embedding models in 2026, benchmarks, dimension/cost tradeoffs, and how to evaluate embeddings on your specific domain.

GraphRAG

GraphRAG builds a knowledge graph from your corpus and uses it to answer complex, multi-hop questions that naive vector RAG fails on. Microsoft's GraphRAG system (2024) showed 2-5x better performance on global/analytical queries.

Hybrid Search

Hybrid search combines dense vector search (semantic similarity) with sparse keyword search (BM25) to retrieve documents. It consistently outperforms either approach alone, especially for queries with specific terms, product names, or technical jargon.

Late Chunking

Late chunking (Jina AI, 2024) embeds the full document first to capture global context, then pools token embeddings into chunk representations. This preserves cross-sentence context in each chunk's embedding, improving retrieval for context-dependent text.

Metadata Filtering

Metadata filtering narrows the vector search space by pre-filtering documents on structured attributes — date, category, author, language — before semantic search. It dramatically improves precision and enables multi-tenant RAG.

Query Expansion

Query expansion uses an LLM to rewrite, decompose, or augment user queries before retrieval, improving recall by generating hypothetical documents, sub-queries, or synonym variations. It solves the vocabulary mismatch problem in RAG.

RAG Evaluation

RAG evaluation measures both retrieval quality (did we fetch the right chunks?) and generation quality (did the LLM produce an accurate, grounded answer?). Learn the RAGAS framework, key metrics, and how to build a continuous eval pipeline.

Reranking

Reranking is a second-stage retrieval step that scores each retrieved chunk for relevance to the query using a cross-encoder model. It consistently improves RAG answer quality by 15–30% over pure vector search with minimal added latency.

Agents & Tools

Tool use, memory, planning, and multi-agent coordination.

Agent Evaluation

Evaluating LLM agents is harder than evaluating single-turn LLMs because agents take sequences of actions, have long-horizon goals, and can fail in many ways. Learn task completion metrics, trajectory evaluation, and how to build regression tests for agents.

Agent Memory

LLM agents need memory to maintain context across conversations and sessions. Learn the four memory types (in-context, external, procedural, episodic), when to use each, and how to build persistent memory systems that don't hallucinate past events.

Agent Planning

Agent planning is how LLM agents decompose complex tasks into executable steps, manage dependencies between steps, and adapt the plan when execution diverges from expectations. Good planning architecture is the difference between agents that complete 10-step tasks and ones that fail after 3.

Error Recovery

Production LLM agents fail in predictable ways: tool errors, invalid JSON, hallucinated arguments, and infinite loops. Learn defensive error handling patterns that let agents recover gracefully rather than crashing or producing wrong outputs.

Human-in-the-Loop

Human-in-the-loop (HITL) patterns define when LLM agents pause for human confirmation, verification, or input. Proper HITL design prevents costly agent mistakes while avoiding excessive interruptions that destroy productivity.

Multi-Agent Coordination

Multi-agent systems use an orchestrator LLM to decompose tasks and delegate to specialized subagents. This enables parallelism, specialization, and fault isolation that single-agent architectures can't achieve. Learn the orchestrator/subagent pattern, handoff protocols, and when to use agents vs. tools.

Parallel Tool Calls

Parallel tool calling lets LLMs request multiple tool executions simultaneously in a single response, rather than sequentially. This reduces multi-step agent latency by 50–80% when tasks can run concurrently.

Prompt Caching

Prompt caching (Anthropic, OpenAI) stores computed key-value pairs for long prompt prefixes and reuses them across requests. It reduces input token costs by 90% and latency by 85% on cache hits — essential for any agent with a large system prompt or repeated context.

Streaming

Streaming sends LLM tokens to the client as they're generated instead of waiting for the complete response. It reduces perceived time-to-first-token from 3–10s to under 500ms, dramatically improving user experience for long-form outputs.

Tool Use

Tool use (function calling) lets LLMs call external APIs, run code, and query databases by describing available functions and receiving structured JSON calls. It's the foundation of all modern LLM agents.

Evaluation

Measure, test, and prevent regressions in LLM applications.

A/B Model Testing

A/B model testing runs two LLM configurations in parallel on real production traffic to measure which produces better outcomes. Unlike offline evals, A/B tests measure actual user behavior and business metrics — the ultimate signal for LLM quality.

Benchmark Selection

Choosing the right benchmarks determines whether your model evaluation is predictive of real-world performance. Learn which benchmarks matter in 2026, how to avoid benchmark gaming, and when to build domain-specific benchmarks instead.

Evals Framework

An evals framework systematically measures LLM application quality across multiple dimensions, catches regressions, and provides actionable feedback. Learn how to structure eval pipelines, write eval functions, and integrate evals into CI/CD.

Golden Dataset

A golden dataset is a curated set of input/expected output pairs used as ground truth for evaluation. It's the foundation of every reliable LLM eval pipeline. Learn how to build, maintain, and expand golden datasets efficiently.

LLM-as-Judge

LLM-as-judge uses a language model to score other LLM outputs on quality dimensions like correctness, faithfulness, helpfulness, and safety. It scales evaluation to millions of examples where human labeling is impractical.

Regression Testing

Regression testing for LLMs catches quality degradations when you change prompts, upgrade models, or modify retrieval systems. Learn how to structure regression tests, set meaningful pass/fail thresholds, and integrate them into your CI/CD pipeline.

Cost & Latency

Cut costs 50-90% and reduce latency without sacrificing quality.

Batch Processing

Batch APIs (Anthropic, OpenAI) process large volumes of LLM requests asynchronously at 50% discount. Learn when to use batch vs. real-time, how to structure batch jobs, and how to handle failures in large batches.

Cost Optimization

LLM API costs can spiral from thousands to hundreds of thousands of dollars monthly as applications scale. This guide covers the complete toolkit: model routing, prompt caching, batching, compression, and output constraints — with realistic savings estimates for each.

Latency Optimization

LLM latency has two components: time-to-first-token (TTFT) and time-to-last-token (generation speed). Learn the techniques to reduce both — streaming, speculative decoding, smaller models, and caching — with concrete benchmarks.

Model Routing

Model routing directs each query to the cheapest or fastest model capable of handling it. By routing simple queries to small models and complex ones to frontier models, most applications can cut costs 50–80% without quality loss.

Token Counting

Tokens are the fundamental unit of LLM pricing and context limits. Understanding how to count tokens accurately lets you predict costs, manage context windows, and debug unexpected billing. Learn the practical differences between tokenizers and common token estimation rules.

Safety & Security

Prevent prompt injection, validate outputs, handle PII.

Guardrails

Guardrails wrap LLM calls with safety and policy checks on both input and output. They intercept harmful requests, enforce topical scope, detect jailbreaks, and ensure outputs comply with content policies and brand guidelines.

Output Validation

Output validation checks LLM responses against schema, content, safety, and business logic rules before acting on them. It's the last line of defense against hallucinations, injection attacks, and model errors in production systems.

PII Handling

LLM applications that process user data must detect, redact, or handle PII (personally identifiable information) in compliance with GDPR, HIPAA, CCPA, and other regulations. Learn detection, redaction, pseudonymization, and architectural patterns for PII-safe LLM pipelines.

Prompt Injection Defense