chatbotllm-apitutorialstreamingconversation-management

How to Build a Chatbot with an LLM API: Full Guide for 2026

How to Build a Chatbot with an LLM API: Full Guide for 2026

Quick answer: A production chatbot needs five things the tutorials don't cover: conversation history management (to avoid context bloat), a well-designed system prompt, streaming for responsiveness, error handling with graceful degradation, and a cost model. This guide covers all five.


Architecture overview

A production LLM chatbot has these layers:

  1. Frontend: Chat UI (React, Vue, or plain HTML)
  2. API layer: Next.js route handler or Express endpoint
  3. Conversation manager: Stores and trims message history
  4. LLM client: Calls the model, handles retries
  5. Persistence: Database for conversation storage


The system prompt is your product

The system prompt defines your chatbot's personality, knowledge, constraints, and behavior. Invest time here, it's the highest use prompt in your application.

You are Aria, a customer support specialist for Acme SaaS.

Your responsibilities:
- Answer questions about Acme's features, pricing, and integrations
- Help users troubleshoot common issues using the knowledge base below
- Escalate to a human agent when: the issue requires account access, the user is frustrated after 3 turns, or the issue is not covered in your knowledge base

Behavior rules:
- Be concise. Maximum 3 sentences per response unless the user asks for detail.
- Never guess. If you don't know, say so and offer to escalate.
- Never discuss competitors by name.

Knowledge base:
[INSERT PRODUCT DOCS HERE]


Conversation history management

A naive implementation appends every message to history indefinitely. This causes:

  • Context window overflow after extended conversations
  • Increasing cost per turn as history grows
  • Degrading quality as old irrelevant context crowds the window

The solution is a conversation manager that trims or summarizes old messages:

const MAX_HISTORY_TOKENS = 4000;

function trimConversation(
  messages: Message[],
  maxTokens: number
): Message[] {
  // Always keep the last N turns
  const KEEP_LAST = 6;
  if (messages.length <= KEEP_LAST) return messages;
  
  // Estimate tokens (rough: 4 chars per token)
  let tokenCount = messages
    .slice(-KEEP_LAST)
    .reduce((sum, m) => sum + m.content.length / 4, 0);
  
  if (tokenCount <= maxTokens) return messages.slice(-KEEP_LAST);
  
  // Further trim if needed
  return messages.slice(-3);
}


Streaming API with Next.js

// app/api/chat/route.ts
import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic();

export async function POST(req: Request) {
  const { messages, systemPrompt } = await req.json();

  const encoder = new TextEncoder();
  const stream = new ReadableStream({
    async start(controller) {
      const response = await client.messages.stream({
        model: 'claude-haiku-4',
        max_tokens: 1024,
        system: systemPrompt,
        messages,
      });

      for await (const event of response) {
        if (event.type === 'content_block_delta' && 
            event.delta.type === 'text_delta') {
          controller.enqueue(encoder.encode(event.delta.text));
        }
      }
      controller.close();
    },
  });

  return new Response(stream, {
    headers: { 'Content-Type': 'text/plain; charset=utf-8' },
  });
}


Model selection for chatbots

For most chatbots:

  • Customer support (high volume): Claude Haiku 4 or GPT-4.1 Mini, fast, cheap, good enough
  • Complex product advisor: Claude Sonnet 4 or GPT-4o, better reasoning for nuanced questions
  • Internal tools: Gemini 2.0 Flash, generous free tier for low-volume internal use

See the best LLMs for chatbot development for a full ranked comparison.


Cost model for chatbots

Estimate your chatbot costs before launch:

Cost per conversation = (system_prompt_tokens + avg_history_tokens + avg_user_tokens)
                        × input_price_per_token
                      + avg_response_tokens × output_price_per_token

For a customer support bot with a 1,000-token system prompt, 500-token history, 50-token user message, and 200-token response at Claude Haiku 4 pricing:

  • Input: 1,550 tokens × $0.80/1M = $0.00124
  • Output: 200 tokens × $4.00/1M = $0.0008
  • Per conversation: $0.00204
  • At 10,000 conversations/month: $20.40/month

Use the LLMversus cost calculator to model your specific chatbot costs and compare across providers.

Your ad here

Related Tools