Fastest LLM API — Speed Comparison

LLM API speed is measured by two key metrics: Time to First Token (TTFT) and throughput (tokens per second). Here's how all models compare:


Fastest by TTFT (time to first token):


  • Gemini 2.0 Flash Lite (Google): 100ms TTFT. $0.075/M input.
  • Phi-4 (Microsoft): 100ms TTFT. $0.065/M input.
  • Gemini 2.0 Flash (Google): 120ms TTFT. $0.100/M input.
  • GPT-4 1.5-nano (OpenAI): 120ms TTFT. $0.100/M input.
  • Grok 3-mini (xAI): 130ms TTFT. $0.300/M input.

  • Fastest by throughput (tokens/second):


  • Gemini 2.0 Flash Lite (Google): 180 tok/s. $0.075/M input.
  • Gemini 2.0 Flash (Google): 160 tok/s. $0.100/M input.
  • Phi-4 (Microsoft): 160 tok/s. $0.065/M input.
  • GPT-4 1.5-nano (OpenAI): 150 tok/s. $0.100/M input.
  • Grok 3-mini (xAI): 140 tok/s. $0.300/M input.

  • When speed matters most: Real-time chat interfaces, autocomplete, streaming code generation, and any application where users are waiting for a response. For background processing and batch jobs, throughput matters more than TTFT.


    Tip: Smaller models are generally faster. If your task doesn't require top-tier reasoning, a model like GPT-4.1 Mini or Gemini Flash will give you much better latency.

    Related Questions