Fastest LLM APIs (2026)

Large language model APIs ranked by tokens per second and time-to-first-token — essential for real-time applications, streaming UIs, and latency-sensitive pipelines.

By LLMversusUpdated April 22, 2026View methodology

Why Gemini 2.0 Flash Lite is Best for Fastest LLM APIs

Gemini 2.0 Flash Lite ranks highest for this use case based on Arena ELO score, benchmark performance, and capability coverage. It provides the best combination of quality, speed, and reliability for these specific tasks.

Cost Estimate

For a typical workload (~50M tokens/month, 60% input / 40% output), the cheapest qualifying model (Gemini 2.0 Flash Lite) costs approximately $8.25/month. The most capable model may cost more but delivers higher quality results.

Price vs Quality for Fastest LLM APIs

Top 5 Models Compared

RankModelProviderInput $/MOutput $/MArena ELOSpeed (tok/s)
#1Gemini 2.0 Flash LiteGoogle$0.075$0.3001200180
#2Gemini 2.0 FlashGoogle$0.100$0.4001260160
#3GPT-4 1.5-miniOpenAI$0.400$1.601180120
#4GPT-4 1.5-nanoOpenAI$0.100$0.4001150150
#5Claude Haiku 4Anthropic$1.00$5.001220130
#1Gemini 2.0 Flash Lite
Google
ELO 1200
Input

$0.075/M

Output

$0.300/M

Verified 2026-04-20

VisionJSON ModeFunctionsMultimodal
#2Gemini 2.0 Flash
Google
ELO 1260
Input

$0.100/M

Output

$0.400/M

Verified 2026-04-20

VisionJSON ModeFunctionsMultimodalCode Exec
#3GPT-4 1.5-mini
OpenAI
ELO 1180
Input

$0.400/M

Output

$1.60/M

Verified 2026-04-20

JSON ModeFunctions
#4GPT-4 1.5-nano
OpenAI
ELO 1150
Input

$0.100/M

Output

$0.400/M

Verified 2026-04-20

JSON ModeFunctions
#5Claude Haiku 4
Anthropic
ELO 1220
Input

$1.00/M

Output

$5.00/M

Verified 2026-04-20

VisionJSON ModeFunctionsMultimodal
#6Llama 4 Scout
Meta
ELO 1250
Input

$0.080/M

Output

$0.300/M

Verified 2026-04-20

VisionJSON ModeFunctionsMultimodal
#7Grok 3-mini
xAI
ELO 1175
Input

$0.300/M

Output

$0.500/M

Verified 2026-04-20

JSON ModeFunctions

Other Categories