Which LLM Has the Largest Context Window?

Context window size determines how much text a model can process in a single request. Here are all models ranked by context window:

Llama 4 Scout (Meta): 10.48576M context window, 32,768 max output. $0.080/M input.

Gemini Experimental 1206 (Google): 2M context window, 8,192 max output. $0.00/M input.

Gemini 1.5 Pro (Google): 2M context window, 8,192 max output. $1.25/M input.

Gemini 2.5 Pro (Google): 1.048576M context window, 65,536 max output. $1.25/M input.

Llama 4 Maverick (Meta): 1.048576M context window, 32,768 max output. $0.150/M input.

Gemini 2.0 Flash (Google): 1.048576M context window, 8,192 max output. $0.100/M input.

Gemini 2.0 Flash Lite (Google): 1.048576M context window, 8,192 max output. $0.075/M input.

Gemini 2.5 Flash (Google): 1M context window, 8,192 max output. $0.300/M input.

Gemini 1.5 Flash (Google): 1M context window, 8,192 max output. $0.075/M input.

Gemini 1.5 Flash 8B (Google): 1M context window, 8,192 max output. $0.037/M input.

Amazon Nova Pro (Amazon): 300K context window, 4,096 max output. $0.800/M input.

Amazon Nova Lite (Amazon): 300K context window, 4,096 max output. $0.060/M input.

Command A (Cohere): 256K context window, 4,096 max output. $2.50/M input.

Codestral 22B (Mistral AI): 256K context window, 4,096 max output. $0.300/M input.

Claude Opus 4 (Anthropic): 200K context window, 32,000 max output. $5.00/M input.

o3 (OpenAI): 200K context window, 100,000 max output. $2.00/M input.

o1 (OpenAI): 200K context window, 100,000 max output. $15.00/M input.

Grok 3 (xAI): 200K context window, 8,192 max output. $3.00/M input.

Claude Sonnet 4 (Anthropic): 200K context window, 64,000 max output. $3.00/M input.

Claude 3.5 Sonnet (Anthropic): 200K context window, 8,192 max output. $3.00/M input.

Claude 3.5 Haiku (Anthropic): 200K context window, 8,192 max output. $0.800/M input.

Claude Haiku 4 (Anthropic): 200K context window, 8,192 max output. $1.00/M input.

Sonar Pro (Perplexity): 200K context window, 8,192 max output. $3.00/M input.

Llama 3.1 405B (Fireworks) (Fireworks AI): 131.072K context window, 4,096 max output. $3.00/M input.

Grok 2 (xAI): 131.072K context window, 4,096 max output. $2.00/M input.

Llama 3.3 70B (Fireworks) (Fireworks AI): 131.072K context window, 4,096 max output. $0.900/M input.

DeepSeek R1 (DeepSeek): 128K context window, 8,192 max output. $0.500/M input.

Qwen 3 235B MoE (Alibaba): 128K context window, 4,096 max output. $0.455/M input.

GPT-4.5 (OpenAI): 128K context window, 8,192 max output. $75.00/M input.

DeepSeek R1 (Groq) (Groq): 128K context window, 8,192 max output. $0.750/M input.

DeepSeek V3 (DeepSeek): 128K context window, 8,192 max output. $0.259/M input.

o3-mini (OpenAI): 128K context window, 65,536 max output. $1.10/M input.

o1-mini (OpenAI): 128K context window, 65,536 max output. $1.10/M input.

ChatGPT-4o Latest (OpenAI): 128K context window, 16,384 max output. $5.00/M input.

GPT-4o (OpenAI): 128K context window, 16,384 max output. $2.50/M input.

o4-mini (OpenAI): 128K context window, 32,768 max output. $1.10/M input.

Qwen 2.5 Max (Alibaba): 128K context window, 8,192 max output. $0.160/M input.

GPT-4o (Aug 2024) (OpenAI): 128K context window, 16,384 max output. $2.50/M input.

DeepSeek R1 Distill Llama 70B (DeepSeek): 128K context window, 8,192 max output. $0.700/M input.

Mistral Large (Mistral): 128K context window, 8,192 max output. $0.500/M input.

GPT-4 Turbo (OpenAI): 128K context window, 4,096 max output. $10.00/M input.

Llama 3.1 405B (Meta): 128K context window, 4,096 max output. $3.00/M input.

Pixtral Large (Mistral AI): 128K context window, 4,096 max output. $2.00/M input.

Qwen 2.5 72B (Alibaba): 128K context window, 4,096 max output. $0.120/M input.

GPT-4o Mini (OpenAI): 128K context window, 16,384 max output. $0.150/M input.

Llama 3.3 70B (Groq) (Groq): 128K context window, 4,096 max output. $0.590/M input.

Llama 3.3 70B (Meta): 128K context window, 4,096 max output. $0.120/M input.

Mistral Medium 3 (Mistral AI): 128K context window, 4,096 max output. $0.400/M input.

Llama 3.3 70B (Together) (Together AI): 128K context window, 4,096 max output. $0.880/M input.

Llama 3.2 90B Vision (Meta): 128K context window, 4,096 max output. $0.900/M input.

Command R+ (Cohere): 128K context window, 4,096 max output. $2.50/M input.

DeepSeek V2.5 (DeepSeek): 128K context window, 4,096 max output. $0.140/M input.

Llama 3.1 70B (Meta): 128K context window, 4,096 max output. $0.400/M input.

Phi-3.5 MoE (Microsoft): 128K context window, 4,096 max output. $0.170/M input.

Mistral Small (Mistral): 128K context window, 8,192 max output. $0.150/M input.

GPT-4 1.5-mini (OpenAI): 128K context window, 4,096 max output. $0.400/M input.

Grok 3-mini (xAI): 128K context window, 4,096 max output. $0.300/M input.

Phi-3 Medium (Microsoft): 128K context window, 4,096 max output. $0.170/M input.

Llama 3.2 11B Vision (Meta): 128K context window, 4,096 max output. $0.245/M input.

Phi-3.5 Mini (Microsoft): 128K context window, 4,096 max output. $0.130/M input.

Qwen 2.5 7B (Alibaba): 128K context window, 4,096 max output. $0.040/M input.

GPT-4 1.5-nano (OpenAI): 128K context window, 4,096 max output. $0.100/M input.

Command R (Cohere): 128K context window, 4,096 max output. $0.150/M input.

Mistral Nemo 12B (Mistral AI): 128K context window, 4,096 max output. $0.020/M input.

Amazon Nova Micro (Amazon): 128K context window, 4,096 max output. $0.035/M input.

Command R7B (Cohere): 128K context window, 4,096 max output. $0.038/M input.

Llama 3.1 8B (Groq) (Groq): 128K context window, 4,096 max output. $0.050/M input.

Llama 3.1 8B (Meta): 128K context window, 4,096 max output. $0.020/M input.

Qwen 2.5 Coder 32B (Alibaba): 128K context window, 4,096 max output. $0.660/M input.

Sonar Reasoning (Perplexity): 127K context window, 8,192 max output. $2.00/M input.

Sonar (Perplexity): 127K context window, 4,096 max output. $1.00/M input.

DeepSeek R1 (Together) (Together AI): 64K context window, 8,192 max output. $3.00/M input.

DeepSeek R1 Distill Qwen 32B (DeepSeek): 64K context window, 8,192 max output. $0.290/M input.

Mixtral 8x22B (Fireworks) (Fireworks AI): 64K context window, 4,096 max output. $0.900/M input.

WizardLM-2 8x22B (Microsoft): 64K context window, 4,096 max output. $0.620/M input.

Gemini 2.0 Flash Thinking (Google): 32K context window, 16,384 max output. $0.00/M input.

QwQ 32B (Alibaba): 32K context window, 8,192 max output. $0.150/M input.

Qwen 2.5 72B (Together) (Together AI): 32K context window, 4,096 max output. $1.20/M input.

Yi-Large (01.AI): 32K context window, 4,096 max output. $3.00/M input.

Mixtral 8x7B (Groq) (Groq): 32K context window, 4,096 max output. $0.240/M input.

InternLM 2.5 20B (Shanghai AI Lab): 32K context window, 4,096 max output. $0.180/M input.

Mistral 7B (Mistral AI): 32K context window, 4,096 max output. $0.110/M input.

Mistral 7B (Together) (Together AI): 32K context window, 4,096 max output. $0.200/M input.

Phi-4 (Microsoft): 16.384K context window, 4,096 max output. $0.065/M input.

Yi-Lightning (01.AI): 16K context window, 4,096 max output. $0.140/M input.

GPT-3.5 Turbo (OpenAI): 16K context window, 4,096 max output. $0.500/M input.

Grok 2 Vision (xAI): 8.192K context window, 4,096 max output. $2.00/M input.

GPT-4 1 (OpenAI): 8.192K context window, 2,048 max output. $2.00/M input.

Gemma 2 27B (Google): 8K context window, 4,096 max output. $0.650/M input.

Gemma 2 9B (Groq) (Groq): 8K context window, 4,096 max output. $0.200/M input.

Gemma 2 9B (Google): 8K context window, 4,096 max output. $0.030/M input.

Llama 3.1 405B (Together) (Together AI): 4K context window, 4,096 max output. $3.50/M input.

Why context window size matters:

Document analysis: Larger windows let you process entire documents, contracts, or codebases in a single request.

Conversation memory: Longer context means the model can remember more of the conversation history.

Few-shot examples: More context lets you include more examples for better in-context learning.

RAG applications: Larger context windows allow retrieving and injecting more relevant documents.

Note: Using the full context window increases latency and cost. Only include as much context as needed for your task.

Which LLM Has the Largest Context Window?

Related Tools

Related Questions

How Much Does Claude API Cost?

What's the Cheapest LLM for Coding?

ChatGPT vs Claude: Which Is Better?

Best LLM API for Production Use

LLM API Pricing Comparison — Complete Guide

How to Reduce LLM API Costs

Fastest LLM API — Speed Comparison