How to Build a Chatbot with an LLM API: Full Guide for 2026

Quick answer: A production chatbot needs five things the tutorials don't cover: conversation history management (to avoid context bloat), a well-designed system prompt, streaming for responsiveness, error handling with graceful degradation, and a cost model. This guide covers all five.

Architecture overview

A production LLM chatbot has these layers:

Frontend: Chat UI (React, Vue, or plain HTML)
API layer: Next.js route handler or Express endpoint
Conversation manager: Stores and trims message history
LLM client: Calls the model, handles retries
Persistence: Database for conversation storage

The system prompt is your product

The system prompt defines your chatbot's personality, knowledge, constraints, and behavior. Invest time here, it's the highest use prompt in your application.

You are Aria, a customer support specialist for Acme SaaS.

Your responsibilities:
- Answer questions about Acme's features, pricing, and integrations
- Help users troubleshoot common issues using the knowledge base below
- Escalate to a human agent when: the issue requires account access, the user is frustrated after 3 turns, or the issue is not covered in your knowledge base

Behavior rules:
- Be concise. Maximum 3 sentences per response unless the user asks for detail.
- Never guess. If you don't know, say so and offer to escalate.
- Never discuss competitors by name.

Knowledge base:
[INSERT PRODUCT DOCS HERE]

Conversation history management

A naive implementation appends every message to history indefinitely. This causes:

Context window overflow after extended conversations
Increasing cost per turn as history grows
Degrading quality as old irrelevant context crowds the window

The solution is a conversation manager that trims or summarizes old messages:

const MAX_HISTORY_TOKENS = 4000;

function trimConversation(
  messages: Message[],
  maxTokens: number
): Message[] {
  // Always keep the last N turns
  const KEEP_LAST = 6;
  if (messages.length <= KEEP_LAST) return messages;
  
  // Estimate tokens (rough: 4 chars per token)
  let tokenCount = messages
    .slice(-KEEP_LAST)
    .reduce((sum, m) => sum + m.content.length / 4, 0);
  
  if (tokenCount <= maxTokens) return messages.slice(-KEEP_LAST);
  
  // Further trim if needed
  return messages.slice(-3);
}

Streaming API with Next.js

// app/api/chat/route.ts
import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic();

export async function POST(req: Request) {
  const { messages, systemPrompt } = await req.json();

  const encoder = new TextEncoder();
  const stream = new ReadableStream({
    async start(controller) {
      const response = await client.messages.stream({
        model: 'claude-haiku-4',
        max_tokens: 1024,
        system: systemPrompt,
        messages,
      });

      for await (const event of response) {
        if (event.type === 'content_block_delta' && 
            event.delta.type === 'text_delta') {
          controller.enqueue(encoder.encode(event.delta.text));
        }
      }
      controller.close();
    },
  });

  return new Response(stream, {
    headers: { 'Content-Type': 'text/plain; charset=utf-8' },
  });
}

Model selection for chatbots

For most chatbots:

Customer support (high volume): Claude Haiku 4 or GPT-4.1 Mini, fast, cheap, good enough
Complex product advisor: Claude Sonnet 4 or GPT-4o, better reasoning for nuanced questions
Internal tools: Gemini 2.0 Flash, generous free tier for low-volume internal use

See the best LLMs for chatbot development for a full ranked comparison.

Cost model for chatbots

Estimate your chatbot costs before launch:

Cost per conversation = (system_prompt_tokens + avg_history_tokens + avg_user_tokens)
                        × input_price_per_token
                      + avg_response_tokens × output_price_per_token

For a customer support bot with a 1,000-token system prompt, 500-token history, 50-token user message, and 200-token response at Claude Haiku 4 pricing:

Input: 1,550 tokens × $0.80/1M = $0.00124
Output: 200 tokens × $4.00/1M = $0.0008
Per conversation: $0.00204
At 10,000 conversations/month: $20.40/month

Use the LLMversus cost calculator to model your specific chatbot costs and compare across providers.