function · Use Case

AI for Customer Service

Chatbots, ticket triage, agent assist, voice bots — the AI stack that deflects 60-80% of support volume while keeping CSAT high.

Updated Apr 16, 20265 workflows~$30–$800 per 1,000 requests

Quick answer

Production pattern: Claude Sonnet 4 as the main agent, Haiku 4 for intent routing, RAG over your help center, tool-use for CRM/order lookups, and a confidence gate for human handoff. Expect $0.03-0.08 per conversation at scale. Intercom Fin, Ada, and Zendesk AI are the leading commercial options at $0.80-1.50 per resolution.

The problem

Customer service is expensive, repetitive, and suffers from seasonal volume spikes. A well-built AI stack deflects the 60-80% of queries that are FAQs or lookups, assists human agents on the remaining 20-40%, and escalates angry customers before they churn. The wrong stack gaslights customers and tanks NPS.

Core workflows

Deflection chatbot (Tier 1)

Answer FAQs, order status, password resets without human involvement. Target 60-80% deflection rate.

claude-sonnet-4intercom-finArchitecture →

Ticket routing + triage

Classify incoming tickets by intent, urgency, and required expertise. Route to the right queue/agent automatically.

claude-haiku-4zendesk-aiArchitecture →

Agent assist (co-pilot)

Real-time suggested replies + policy lookup in the agent's sidebar. Cuts handle time 20-35%.

claude-sonnet-4gorgiasArchitecture →

Voice bot (phone IVR replacement)

Handle phone support with a voice LLM for password resets, order status, simple returns.

gpt-4oparloaArchitecture →

QA + coaching

Score 100% of conversations on a rubric, flag coaching opportunities, identify training gaps.

claude-sonnet-4observe-aiArchitecture →

Top tools

  • intercom-fin
  • zendesk-ai
  • ada
  • gorgias
  • parloa
  • observe-ai

Top models

  • claude-sonnet-4
  • claude-haiku-4
  • gpt-4o
  • gemini-2-5-pro

FAQs

Can AI fully replace customer service agents?

Not in 2026. The production pattern is AI handles 60-80% of volume (FAQs, lookups, policy) and escalates the rest. Full replacement leads to CSAT drops and regulatory risk in EU. The best deployments treat AI as tier-1 + agent co-pilot, not total replacement.

Build vs buy — what's the call?

Buy Intercom Fin, Zendesk AI, or Ada if you're under 100k conversations/month — the 'done' infra (logs, evals, integrations) is worth $1-2/resolution. Build on Claude API directly above 1M/month where the $0.05/conversation cost delta pays for an engineering team.

What deflection rate should I expect?

60-70% is realistic for well-tuned deployments on typical SaaS support. E-commerce gets 70-85% because order status + returns are templated. Complex B2B or regulated industries top out at 40-55%.

How do I prevent the bot from making things up?

Enforce tool-use for anything factual (order lookups, policy). Reject freeform answers about refunds/pricing. Run a daily LLM-as-judge eval on a 100-conversation sample. Log retrieval hit rate and flag questions where retrieval failed.

When should the bot escalate to a human?

Four triggers: (1) sentiment-classifier detects anger, (2) confidence < threshold on the main answer, (3) explicit 'talk to a human' intent, (4) the topic hits a policy-gated area (legal, payments, account deletion).

What about multilingual support?

Claude Sonnet 4 and GPT-4o are strong across 30+ languages natively. For long-tail languages (Swahili, Bengali, etc.) quality is inconsistent — use a translation pipeline (DeepL or Gemini) + handle in English for truly low-resource languages.

How fast does it need to respond?

Target P95 under 3 seconds end-to-end. Users perceive sub-3s as 'fast enough'. Streaming the LLM output (starts in 400-800ms) helps perceived speed even if full response takes 2-3s.

Related architectures