LLM Speed Comparison 2026

Compare time to first token (TTFT) and throughput across 25 large language models. Filter by provider to find the fastest API for your use case.

Data verified Apr 6, 2026
OpenAI
Anthropic
Google
Meta
Mistral
DeepSeek
xAI
Cohere
Microsoft
Alibaba

Time to First Token (lower is better)

Throughput (higher is better)

All Models — Ranked by Speed

ModelProviderTTFT (ms)Tokens/secInput $/MArena ELO
Gemini 2.0 Flash LiteGoogle100180$0.0751200
Phi-4Microsoft100160$0.0651150
Gemini 2.0 FlashGoogle120160$0.1001260
GPT-4.1 NanoOpenAI130150$0.1001180
Claude Haiku 4Anthropic150130$1.001220
Mistral SmallMistral160120$0.1501185
GPT-4o MiniOpenAI180120$0.1501220
Grok 3 MinixAI180110$0.3001220
GPT-4.1 MiniOpenAI190115$0.4001240
Llama 4 ScoutMeta200110$0.0801250
DeepSeek V3DeepSeek22085$0.2001280
GPT-4oOpenAI23095$2.501260
Qwen 2.5 MaxAlibaba24080$0.1601260
Llama 4 MaverickMeta25090$0.1501290
Command RCohere25085$0.1501140
GPT-4.1OpenAI26088$2.001290
Mistral LargeMistral28075$0.5001245
Grok 3xAI30080$3.001300
Claude Sonnet 4Anthropic32078$3.001280
Command R+Cohere35065$2.501200
Gemini 2.5 ProGoogle40070$1.251430
Claude Opus 4Anthropic50050$5.001504
o4-miniOpenAI1,20060$1.101350
o3-miniOpenAI1,50055$1.101310
DeepSeek R1DeepSeek1,80045$0.7001310

Frequently Asked Questions

Which LLM is fastest?
Speed depends on which metric matters to you. For time to first token (TTFT), smaller models like GPT-4.1 Nano and Gemini 2.0 Flash Lite are fastest at under 100ms. For throughput (tokens per second), Gemini 2.0 Flash and GPT-4o-mini lead with 200+ tokens/sec. Larger reasoning models like o3-mini and DeepSeek R1 trade speed for quality.
What is TTFT?
TTFT stands for Time to First Token. It measures how many milliseconds pass between sending your API request and receiving the first token of the response. Lower TTFT means the user sees output faster, which is critical for real-time chat applications, autocomplete, and streaming UIs. TTFT is separate from throughput, which measures how fast tokens are generated after the first one.
Does speed affect quality?
Generally, faster models sacrifice some benchmark performance. Smaller, distilled models (like GPT-4.1 Nano or Gemini Flash Lite) are much faster but score lower on complex reasoning tasks. However, this isn't always the case — GPT-4o offers competitive quality with good speed. Choose based on your task: simple classification and extraction can use fast models, while complex analysis benefits from slower, more capable ones.