agentsintermediate

Agent Memory: Short-Term, Long-Term, and Episodic (2026)

Quick Answer

Agent memory has four layers: (1) in-context (current conversation window), (2) external/semantic (vector DB for past facts), (3) episodic (structured log of past events and decisions), (4) procedural (learned preferences and rules). Most production agents need at least in-context + external memory. The key challenge is deciding what to save, how to retrieve it, and how to prevent stale memories from causing hallucinations.

When to Use

✓Conversational agents that need to remember user preferences and past interactions across sessions
✓Research agents that accumulate findings over multiple tool calls and need to not repeat work
✓Personal assistants that should remember names, preferences, and context about the user
✓Multi-session workflows where the agent must pick up where it left off after interruption
✓Customer support agents that need access to full conversation history and past ticket resolutions

How It Works

1In-context memory: the current conversation messages. Limited by context window. For long conversations, implement sliding window (keep last N turns) or summary compression (LLM summarizes old turns into a brief).
2External semantic memory: store facts as text chunks in a vector database. Retrieve relevant memories at the start of each turn. Key operations: SAVE (after each turn, extract memorable facts), RETRIEVE (before each turn, search for relevant memories), and FORGET (TTL or explicit deletion).
3Episodic memory: a structured log of events with timestamps. 'On 2026-04-10, the agent searched for X, found Y, and decided Z.' Used by the agent to avoid repeating work and to audit its own decisions.
4Procedural memory: learned user preferences and agent rules stored as key-value pairs or structured documents. 'User prefers Python over JavaScript', 'Always verify price quotes before including in output'. These are loaded into the system prompt.
5Memory management: implement explicit save/load operations. After a conversation ends, run a memory extraction pass to identify facts worth preserving. Use an LLM to score memories by long-term value and discard low-value items.

Examples

Memory save and retrieve with vector DB

from anthropic import Anthropic
from your_vector_db import VectorDB

client = Anthropic()
db = VectorDB()

def chat_with_memory(user_id: str, user_message: str, conversation_history: list):
    # Retrieve relevant memories
    relevant_memories = db.search(
        query=user_message,
        filter={'user_id': user_id},
        top_k=5
    )
    
    memory_context = '\n'.join([m.text for m in relevant_memories])
    system = f'You are a helpful assistant.\n\nRelevant memories about this user:\n{memory_context}'
    
    response = client.messages.create(
        model='claude-3-5-sonnet-20241022',
        system=system,
        messages=conversation_history + [{'role': 'user', 'content': user_message}],
        max_tokens=1024
    )
    
    # Extract and save new memories
    new_facts = extract_memorable_facts(user_message, response.content[0].text)
    for fact in new_facts:
        db.insert(text=fact, metadata={'user_id': user_id, 'date': today()})
    
    return response.content[0].text

Output:Pattern: retrieve memories → inject into system prompt → generate response → extract new memories → save. The extract_memorable_facts function is an LLM call: 'Extract any user preferences, facts about the user, or important decisions from this exchange.'

Conversation summarization for long contexts

def compress_conversation(messages: list, keep_last_n: int = 10) -> list:
    if len(messages) <= keep_last_n:
        return messages
    
    # Summarize older messages
    old_messages = messages[:-keep_last_n]
    recent_messages = messages[-keep_last_n:]
    
    summary_response = client.messages.create(
        model='claude-3-5-haiku-20241022',
        max_tokens=500,
        messages=[{
            'role': 'user',
            'content': f'Summarize this conversation history in 3-5 sentences, preserving key decisions, facts learned, and current task state:\n\n{format_messages(old_messages)}'
        }]
    )
    
    summary = summary_response.content[0].text
    return [
        {'role': 'user', 'content': f'[Conversation summary: {summary}]'},
        {'role': 'assistant', 'content': 'Understood. Continuing from where we left off.'},
        *recent_messages
    ]

Output:Compresses old conversation turns into a summary, keeping the last N turns verbatim. Reduces token count by 60-80% for long conversations while preserving key context.

Common Mistakes

✗Saving too many memories — storing every sentence from every conversation creates a noisy memory store where relevant memories are buried. Save only: user preferences stated explicitly, important decisions made, key facts about the user/domain.
✗Retrieving memories without recency weighting — a memory from 6 months ago about the user's project may be outdated. Add recency weighting to memory retrieval: boost memories from the last 7 days, discount memories older than 90 days.
✗Blindly trusting retrieved memories — memories can be stale or incorrect. When using a memory, the agent should verify it's still relevant: 'Based on our conversation from last week, you were working on X — is that still the case?'
✗No memory deletion — without explicit memory management, the memory store grows forever. Implement TTL (delete after 90 days), explicit user commands ('forget X'), and periodic cleanup of contradicted memories.

FAQ

What's the difference between memory and conversation history?+

Conversation history is the raw message list from the current session — it lives in the context window and is lost when the session ends. Memory is what persists across sessions: extracted facts, preferences, and summaries stored in a database. Conversation history is always present for in-session context; memory is retrieved selectively based on relevance.

Which vector database is best for agent memory?+

For personal agent memory (one user, moderate scale): pgvector with metadata filtering is sufficient and avoids a separate service. For multi-tenant agent memory at scale: Qdrant or Pinecone with user_id metadata filtering. For high-performance local deployments: Chroma. The choice matters less than the schema — design your metadata (user_id, memory_type, date, ttl) carefully.

How do I handle contradicting memories?+

When saving a new memory, check if it contradicts an existing one using embedding similarity. If a new memory (user: 'I now prefer TypeScript') is similar to an old one (user: 'I prefer JavaScript'), replace or mark the old one as superseded. An LLM can help with contradiction detection: 'Does this new fact contradict any of these existing memories?'

What should I store in procedural vs. episodic memory?+

Procedural memory stores persistent rules and preferences: 'User prefers bullet points over prose', 'Always cite sources'. These go in the system prompt every session. Episodic memory stores time-bound events: 'On April 10, user asked about X and was satisfied with approach Y'. These are retrieved only when relevant to current task.

Is there a standard memory library for agents?+

Mem0 (getmem0.ai) is the most popular open-source agent memory library as of 2026. It handles save/retrieve/delete with hybrid (vector + entity graph) memory. LangChain's ConversationBufferMemory and ConversationSummaryMemory handle simple in-session memory. For custom needs, building on pgvector or Qdrant directly gives more control.

tool use agent planning human in loop contextual retrieval ↗ meeting notetaker agent ↗ email triage agent ↗ research agent

Agent Memory: Short-Term, Long-Term, and Episodic (2026)

When to Use

How It Works

Examples

Common Mistakes

FAQ

Related