Context window size determines how much text a model can process in a single request. Here are all models ranked by context window:
Llama 4 Scout (Meta): 10.48576M context window, 32,768 max output. $0.080/M input.Gemini Experimental 1206 (Google): 2M context window, 8,192 max output. $0.00/M input.Gemini 1.5 Pro (Google): 2M context window, 8,192 max output. $1.25/M input.Gemini 2.5 Pro (Google): 1.048576M context window, 65,536 max output. $1.25/M input.Llama 4 Maverick (Meta): 1.048576M context window, 32,768 max output. $0.150/M input.Gemini 2.0 Flash (Google): 1.048576M context window, 8,192 max output. $0.100/M input.Gemini 2.0 Flash Lite (Google): 1.048576M context window, 8,192 max output. $0.075/M input.Gemini 2.5 Flash (Google): 1M context window, 8,192 max output. $0.300/M input.Gemini 1.5 Flash (Google): 1M context window, 8,192 max output. $0.075/M input.Gemini 1.5 Flash 8B (Google): 1M context window, 8,192 max output. $0.037/M input.Amazon Nova Pro (Amazon): 300K context window, 4,096 max output. $0.800/M input.Amazon Nova Lite (Amazon): 300K context window, 4,096 max output. $0.060/M input.Command A (Cohere): 256K context window, 4,096 max output. $2.50/M input.Codestral 22B (Mistral AI): 256K context window, 4,096 max output. $0.300/M input.Claude Opus 4 (Anthropic): 200K context window, 32,000 max output. $5.00/M input.o3 (OpenAI): 200K context window, 100,000 max output. $2.00/M input.o1 (OpenAI): 200K context window, 100,000 max output. $15.00/M input.Grok 3 (xAI): 200K context window, 8,192 max output. $3.00/M input.Claude Sonnet 4 (Anthropic): 200K context window, 64,000 max output. $3.00/M input.Claude 3.5 Sonnet (Anthropic): 200K context window, 8,192 max output. $3.00/M input.Claude 3.5 Haiku (Anthropic): 200K context window, 8,192 max output. $0.800/M input.Claude Haiku 4 (Anthropic): 200K context window, 8,192 max output. $1.00/M input.Sonar Pro (Perplexity): 200K context window, 8,192 max output. $3.00/M input.Llama 3.1 405B (Fireworks) (Fireworks AI): 131.072K context window, 4,096 max output. $3.00/M input.Grok 2 (xAI): 131.072K context window, 4,096 max output. $2.00/M input.Llama 3.3 70B (Fireworks) (Fireworks AI): 131.072K context window, 4,096 max output. $0.900/M input.DeepSeek R1 (DeepSeek): 128K context window, 8,192 max output. $0.500/M input.Qwen 3 235B MoE (Alibaba): 128K context window, 4,096 max output. $0.455/M input.GPT-4.5 (OpenAI): 128K context window, 8,192 max output. $75.00/M input.DeepSeek R1 (Groq) (Groq): 128K context window, 8,192 max output. $0.750/M input.DeepSeek V3 (DeepSeek): 128K context window, 8,192 max output. $0.259/M input.o3-mini (OpenAI): 128K context window, 65,536 max output. $1.10/M input.o1-mini (OpenAI): 128K context window, 65,536 max output. $1.10/M input.ChatGPT-4o Latest (OpenAI): 128K context window, 16,384 max output. $5.00/M input.GPT-4o (OpenAI): 128K context window, 16,384 max output. $2.50/M input.o4-mini (OpenAI): 128K context window, 32,768 max output. $1.10/M input.Qwen 2.5 Max (Alibaba): 128K context window, 8,192 max output. $0.160/M input.GPT-4o (Aug 2024) (OpenAI): 128K context window, 16,384 max output. $2.50/M input.DeepSeek R1 Distill Llama 70B (DeepSeek): 128K context window, 8,192 max output. $0.700/M input.Mistral Large (Mistral): 128K context window, 8,192 max output. $0.500/M input.GPT-4 Turbo (OpenAI): 128K context window, 4,096 max output. $10.00/M input.Llama 3.1 405B (Meta): 128K context window, 4,096 max output. $3.00/M input.Pixtral Large (Mistral AI): 128K context window, 4,096 max output. $2.00/M input.Qwen 2.5 72B (Alibaba): 128K context window, 4,096 max output. $0.120/M input.GPT-4o Mini (OpenAI): 128K context window, 16,384 max output. $0.150/M input.Llama 3.3 70B (Groq) (Groq): 128K context window, 4,096 max output. $0.590/M input.Llama 3.3 70B (Meta): 128K context window, 4,096 max output. $0.120/M input.Mistral Medium 3 (Mistral AI): 128K context window, 4,096 max output. $0.400/M input.Llama 3.3 70B (Together) (Together AI): 128K context window, 4,096 max output. $0.880/M input.Llama 3.2 90B Vision (Meta): 128K context window, 4,096 max output. $0.900/M input.Command R+ (Cohere): 128K context window, 4,096 max output. $2.50/M input.DeepSeek V2.5 (DeepSeek): 128K context window, 4,096 max output. $0.140/M input.Llama 3.1 70B (Meta): 128K context window, 4,096 max output. $0.400/M input.Phi-3.5 MoE (Microsoft): 128K context window, 4,096 max output. $0.170/M input.Mistral Small (Mistral): 128K context window, 8,192 max output. $0.150/M input.GPT-4 1.5-mini (OpenAI): 128K context window, 4,096 max output. $0.400/M input.Grok 3-mini (xAI): 128K context window, 4,096 max output. $0.300/M input.Phi-3 Medium (Microsoft): 128K context window, 4,096 max output. $0.170/M input.Llama 3.2 11B Vision (Meta): 128K context window, 4,096 max output. $0.245/M input.Phi-3.5 Mini (Microsoft): 128K context window, 4,096 max output. $0.130/M input.Qwen 2.5 7B (Alibaba): 128K context window, 4,096 max output. $0.040/M input.GPT-4 1.5-nano (OpenAI): 128K context window, 4,096 max output. $0.100/M input.Command R (Cohere): 128K context window, 4,096 max output. $0.150/M input.Mistral Nemo 12B (Mistral AI): 128K context window, 4,096 max output. $0.020/M input.Amazon Nova Micro (Amazon): 128K context window, 4,096 max output. $0.035/M input.Command R7B (Cohere): 128K context window, 4,096 max output. $0.038/M input.Llama 3.1 8B (Groq) (Groq): 128K context window, 4,096 max output. $0.050/M input.Llama 3.1 8B (Meta): 128K context window, 4,096 max output. $0.020/M input.Qwen 2.5 Coder 32B (Alibaba): 128K context window, 4,096 max output. $0.660/M input.Sonar Reasoning (Perplexity): 127K context window, 8,192 max output. $2.00/M input.Sonar (Perplexity): 127K context window, 4,096 max output. $1.00/M input.DeepSeek R1 (Together) (Together AI): 64K context window, 8,192 max output. $3.00/M input.DeepSeek R1 Distill Qwen 32B (DeepSeek): 64K context window, 8,192 max output. $0.290/M input.Mixtral 8x22B (Fireworks) (Fireworks AI): 64K context window, 4,096 max output. $0.900/M input.WizardLM-2 8x22B (Microsoft): 64K context window, 4,096 max output. $0.620/M input.Gemini 2.0 Flash Thinking (Google): 32K context window, 16,384 max output. $0.00/M input.QwQ 32B (Alibaba): 32K context window, 8,192 max output. $0.150/M input.Qwen 2.5 72B (Together) (Together AI): 32K context window, 4,096 max output. $1.20/M input.Yi-Large (01.AI): 32K context window, 4,096 max output. $3.00/M input.Mixtral 8x7B (Groq) (Groq): 32K context window, 4,096 max output. $0.240/M input.InternLM 2.5 20B (Shanghai AI Lab): 32K context window, 4,096 max output. $0.180/M input.Mistral 7B (Mistral AI): 32K context window, 4,096 max output. $0.110/M input.Mistral 7B (Together) (Together AI): 32K context window, 4,096 max output. $0.200/M input.Phi-4 (Microsoft): 16.384K context window, 4,096 max output. $0.065/M input.Yi-Lightning (01.AI): 16K context window, 4,096 max output. $0.140/M input.GPT-3.5 Turbo (OpenAI): 16K context window, 4,096 max output. $0.500/M input.Grok 2 Vision (xAI): 8.192K context window, 4,096 max output. $2.00/M input.GPT-4 1 (OpenAI): 8.192K context window, 2,048 max output. $2.00/M input.Gemma 2 27B (Google): 8K context window, 4,096 max output. $0.650/M input.Gemma 2 9B (Groq) (Groq): 8K context window, 4,096 max output. $0.200/M input.Gemma 2 9B (Google): 8K context window, 4,096 max output. $0.030/M input.Llama 3.1 405B (Together) (Together AI): 4K context window, 4,096 max output. $3.50/M input.Why context window size matters:
Document analysis: Larger windows let you process entire documents, contracts, or codebases in a single request.Conversation memory: Longer context means the model can remember more of the conversation history.Few-shot examples: More context lets you include more examples for better in-context learning.RAG applications: Larger context windows allow retrieving and injecting more relevant documents.Note: Using the full context window increases latency and cost. Only include as much context as needed for your task.