Question 1

Does a larger context window always mean better performance?

Accepted Answer

No. The 'lost in the middle' problem is well-documented: LLMs perform significantly worse at recalling information from the middle of a very long context compared to the beginning and end. For most tasks, a well-designed 32K context with good retrieval outperforms naively stuffing 200K tokens. Use large contexts only when the task genuinely requires simultaneous awareness of all content.

Question 2

How much does a 1M token context cost per query?

Accepted Answer

At current prices (April 2026): Gemini 1.5 Pro charges $1.25/M tokens for inputs over 128K. A single 1M token query costs $1.25 in input tokens alone, plus output tokens. For 1,000 queries/day at this size, you're spending $1,250/day ($37,500/month) on input tokens alone. This is why retrieval-first approaches are economically critical at scale.

Question 3

What is the 'lost in the middle' problem?

Accepted Answer

Research from Stanford (Liu et al., 2023) and subsequent studies show that LLMs are much better at using information at the start and end of their context window than information in the middle. In long contexts, critical information in the middle can be effectively 'lost.' Mitigations include: placing the most important content at the beginning/end, using retrieval to surface relevant chunks, and using models specifically optimized for long-context retrieval (Claude Sonnet 4 has notably better long-context performance).

Question 4

Should I use prompt caching for long contexts?

Accepted Answer

Absolutely. If you're using a large system prompt or repeatedly loading the same documents into context, prompt caching is transformative. Anthropic's caching saves up to 90% on cached prefix tokens; OpenAI's automatic caching saves 50%. For a 50K token system prompt sent 1,000 times/day, caching saves ~$225/day with Claude Sonnet 4. See the prompt caching decision tool for details.

How Much Context Window Do You Actually Need?

What content do you need in context at once?

FAQ

Related Tools