Which LLM Has the Largest Context Window?

Context window size determines how much text a model can process in a single request. Here are all models ranked by context window:


  • Llama 4 Scout (Meta): 10.48576M context window, 32,768 max output. $0.080/M input.
  • Gemini Experimental 1206 (Google): 2M context window, 8,192 max output. $0.00/M input.
  • Gemini 1.5 Pro (Google): 2M context window, 8,192 max output. $1.25/M input.
  • Gemini 2.5 Pro (Google): 1.048576M context window, 65,536 max output. $1.25/M input.
  • Llama 4 Maverick (Meta): 1.048576M context window, 32,768 max output. $0.150/M input.
  • Gemini 2.0 Flash (Google): 1.048576M context window, 8,192 max output. $0.100/M input.
  • Gemini 2.0 Flash Lite (Google): 1.048576M context window, 8,192 max output. $0.075/M input.
  • Gemini 2.5 Flash (Google): 1M context window, 8,192 max output. $0.300/M input.
  • Gemini 1.5 Flash (Google): 1M context window, 8,192 max output. $0.075/M input.
  • Gemini 1.5 Flash 8B (Google): 1M context window, 8,192 max output. $0.037/M input.
  • Amazon Nova Pro (Amazon): 300K context window, 4,096 max output. $0.800/M input.
  • Amazon Nova Lite (Amazon): 300K context window, 4,096 max output. $0.060/M input.
  • Command A (Cohere): 256K context window, 4,096 max output. $2.50/M input.
  • Codestral 22B (Mistral AI): 256K context window, 4,096 max output. $0.300/M input.
  • Claude Opus 4 (Anthropic): 200K context window, 32,000 max output. $5.00/M input.
  • o3 (OpenAI): 200K context window, 100,000 max output. $2.00/M input.
  • o1 (OpenAI): 200K context window, 100,000 max output. $15.00/M input.
  • Grok 3 (xAI): 200K context window, 8,192 max output. $3.00/M input.
  • Claude Sonnet 4 (Anthropic): 200K context window, 64,000 max output. $3.00/M input.
  • Claude 3.5 Sonnet (Anthropic): 200K context window, 8,192 max output. $3.00/M input.
  • Claude 3.5 Haiku (Anthropic): 200K context window, 8,192 max output. $0.800/M input.
  • Claude Haiku 4 (Anthropic): 200K context window, 8,192 max output. $1.00/M input.
  • Sonar Pro (Perplexity): 200K context window, 8,192 max output. $3.00/M input.
  • Llama 3.1 405B (Fireworks) (Fireworks AI): 131.072K context window, 4,096 max output. $3.00/M input.
  • Grok 2 (xAI): 131.072K context window, 4,096 max output. $2.00/M input.
  • Llama 3.3 70B (Fireworks) (Fireworks AI): 131.072K context window, 4,096 max output. $0.900/M input.
  • DeepSeek R1 (DeepSeek): 128K context window, 8,192 max output. $0.500/M input.
  • Qwen 3 235B MoE (Alibaba): 128K context window, 4,096 max output. $0.455/M input.
  • GPT-4.5 (OpenAI): 128K context window, 8,192 max output. $75.00/M input.
  • DeepSeek R1 (Groq) (Groq): 128K context window, 8,192 max output. $0.750/M input.
  • DeepSeek V3 (DeepSeek): 128K context window, 8,192 max output. $0.259/M input.
  • o3-mini (OpenAI): 128K context window, 65,536 max output. $1.10/M input.
  • o1-mini (OpenAI): 128K context window, 65,536 max output. $1.10/M input.
  • ChatGPT-4o Latest (OpenAI): 128K context window, 16,384 max output. $5.00/M input.
  • GPT-4o (OpenAI): 128K context window, 16,384 max output. $2.50/M input.
  • o4-mini (OpenAI): 128K context window, 32,768 max output. $1.10/M input.
  • Qwen 2.5 Max (Alibaba): 128K context window, 8,192 max output. $0.160/M input.
  • GPT-4o (Aug 2024) (OpenAI): 128K context window, 16,384 max output. $2.50/M input.
  • DeepSeek R1 Distill Llama 70B (DeepSeek): 128K context window, 8,192 max output. $0.700/M input.
  • Mistral Large (Mistral): 128K context window, 8,192 max output. $0.500/M input.
  • GPT-4 Turbo (OpenAI): 128K context window, 4,096 max output. $10.00/M input.
  • Llama 3.1 405B (Meta): 128K context window, 4,096 max output. $3.00/M input.
  • Pixtral Large (Mistral AI): 128K context window, 4,096 max output. $2.00/M input.
  • Qwen 2.5 72B (Alibaba): 128K context window, 4,096 max output. $0.120/M input.
  • GPT-4o Mini (OpenAI): 128K context window, 16,384 max output. $0.150/M input.
  • Llama 3.3 70B (Groq) (Groq): 128K context window, 4,096 max output. $0.590/M input.
  • Llama 3.3 70B (Meta): 128K context window, 4,096 max output. $0.120/M input.
  • Mistral Medium 3 (Mistral AI): 128K context window, 4,096 max output. $0.400/M input.
  • Llama 3.3 70B (Together) (Together AI): 128K context window, 4,096 max output. $0.880/M input.
  • Llama 3.2 90B Vision (Meta): 128K context window, 4,096 max output. $0.900/M input.
  • Command R+ (Cohere): 128K context window, 4,096 max output. $2.50/M input.
  • DeepSeek V2.5 (DeepSeek): 128K context window, 4,096 max output. $0.140/M input.
  • Llama 3.1 70B (Meta): 128K context window, 4,096 max output. $0.400/M input.
  • Phi-3.5 MoE (Microsoft): 128K context window, 4,096 max output. $0.170/M input.
  • Mistral Small (Mistral): 128K context window, 8,192 max output. $0.150/M input.
  • GPT-4 1.5-mini (OpenAI): 128K context window, 4,096 max output. $0.400/M input.
  • Grok 3-mini (xAI): 128K context window, 4,096 max output. $0.300/M input.
  • Phi-3 Medium (Microsoft): 128K context window, 4,096 max output. $0.170/M input.
  • Llama 3.2 11B Vision (Meta): 128K context window, 4,096 max output. $0.245/M input.
  • Phi-3.5 Mini (Microsoft): 128K context window, 4,096 max output. $0.130/M input.
  • Qwen 2.5 7B (Alibaba): 128K context window, 4,096 max output. $0.040/M input.
  • GPT-4 1.5-nano (OpenAI): 128K context window, 4,096 max output. $0.100/M input.
  • Command R (Cohere): 128K context window, 4,096 max output. $0.150/M input.
  • Mistral Nemo 12B (Mistral AI): 128K context window, 4,096 max output. $0.020/M input.
  • Amazon Nova Micro (Amazon): 128K context window, 4,096 max output. $0.035/M input.
  • Command R7B (Cohere): 128K context window, 4,096 max output. $0.038/M input.
  • Llama 3.1 8B (Groq) (Groq): 128K context window, 4,096 max output. $0.050/M input.
  • Llama 3.1 8B (Meta): 128K context window, 4,096 max output. $0.020/M input.
  • Qwen 2.5 Coder 32B (Alibaba): 128K context window, 4,096 max output. $0.660/M input.
  • Sonar Reasoning (Perplexity): 127K context window, 8,192 max output. $2.00/M input.
  • Sonar (Perplexity): 127K context window, 4,096 max output. $1.00/M input.
  • DeepSeek R1 (Together) (Together AI): 64K context window, 8,192 max output. $3.00/M input.
  • DeepSeek R1 Distill Qwen 32B (DeepSeek): 64K context window, 8,192 max output. $0.290/M input.
  • Mixtral 8x22B (Fireworks) (Fireworks AI): 64K context window, 4,096 max output. $0.900/M input.
  • WizardLM-2 8x22B (Microsoft): 64K context window, 4,096 max output. $0.620/M input.
  • Gemini 2.0 Flash Thinking (Google): 32K context window, 16,384 max output. $0.00/M input.
  • QwQ 32B (Alibaba): 32K context window, 8,192 max output. $0.150/M input.
  • Qwen 2.5 72B (Together) (Together AI): 32K context window, 4,096 max output. $1.20/M input.
  • Yi-Large (01.AI): 32K context window, 4,096 max output. $3.00/M input.
  • Mixtral 8x7B (Groq) (Groq): 32K context window, 4,096 max output. $0.240/M input.
  • InternLM 2.5 20B (Shanghai AI Lab): 32K context window, 4,096 max output. $0.180/M input.
  • Mistral 7B (Mistral AI): 32K context window, 4,096 max output. $0.110/M input.
  • Mistral 7B (Together) (Together AI): 32K context window, 4,096 max output. $0.200/M input.
  • Phi-4 (Microsoft): 16.384K context window, 4,096 max output. $0.065/M input.
  • Yi-Lightning (01.AI): 16K context window, 4,096 max output. $0.140/M input.
  • GPT-3.5 Turbo (OpenAI): 16K context window, 4,096 max output. $0.500/M input.
  • Grok 2 Vision (xAI): 8.192K context window, 4,096 max output. $2.00/M input.
  • GPT-4 1 (OpenAI): 8.192K context window, 2,048 max output. $2.00/M input.
  • Gemma 2 27B (Google): 8K context window, 4,096 max output. $0.650/M input.
  • Gemma 2 9B (Groq) (Groq): 8K context window, 4,096 max output. $0.200/M input.
  • Gemma 2 9B (Google): 8K context window, 4,096 max output. $0.030/M input.
  • Llama 3.1 405B (Together) (Together AI): 4K context window, 4,096 max output. $3.50/M input.

  • Why context window size matters:


  • Document analysis: Larger windows let you process entire documents, contracts, or codebases in a single request.
  • Conversation memory: Longer context means the model can remember more of the conversation history.
  • Few-shot examples: More context lets you include more examples for better in-context learning.
  • RAG applications: Larger context windows allow retrieving and injecting more relevant documents.

  • Note: Using the full context window increases latency and cost. Only include as much context as needed for your task.

    Related Questions