Fundamentals

Streaming

Quick Answer

Receiving LLM outputs token-by-token as they're generated rather than waiting for completion.

Streaming allows applications to receive tokens as soon as they're generated, rather than waiting for the entire response. This dramatically improves perceived latency—users see text appearing in real-time. Streaming is essential for chat applications and anything requiring responsiveness. Technically, the model still generates all tokens, but the API sends them incrementally. Streaming works well for most applications except those requiring the complete output before processing. It also enables real-time interruption if the user cancels. All major LLM APIs support streaming.

Last verified: 2026-04-08

Compare models

See how different LLMs compare on benchmarks, pricing, and speed.

Browse all models →

← All glossary terms