Fundamentals
Max Tokens
Quick Answer
The maximum number of tokens the model will generate in a completion.
Max tokens sets an upper limit on the length of the model's output. If you set max_tokens=500, the model will stop generating after 500 tokens regardless of whether the response is complete. This parameter is crucial for controlling costs (longer outputs = higher API bills), managing latency, and preventing runaway outputs. However, setting max_tokens too low can cause incomplete answers. You must balance thoroughness against cost and speed.
Last verified: 2026-04-08