Inference
Time to First Token (TTFT)
Quick Answer
The latency from sending a request to receiving the first output token.
Time-to-first-token (TTFT) is how long before the first output token appears. TTFT heavily impacts perceived responsiveness in interactive applications. TTFT includes processing input, initializing KV cache, and generating the first token. Prefilling and other optimizations reduce TTFT. Batching increases TTFT since requests must wait for batch assembly. Interactive applications should optimize TTFT; batch applications care more about throughput. TTFT is often more critical than end-to-end latency for user experience.
Last verified: 2026-04-08