How to Calculate LLM API Costs Before You Build: The Complete Formula
One of the most common mistakes in building LLM-powered products: discovering the real cost only after you've built it. A feature that costs $0.01 per use in development can cost $500/day in production at scale. Here's how to calculate costs accurately before you build.
The Core Cost Formula
LLM API cost has three components:
Total Cost = (Input Tokens × Input Price)
+ (Output Tokens × Output Price)
- (Cached Tokens × Cache Savings)
All prices are per 1 million tokens.
Step 1: Count Your Tokens
Before you can calculate costs, you need to know how many tokens your prompts use.
What Is a Token?
A token is roughly 4 characters of English text, or 0.75 words. But it varies:
- Common English words: ~1 token each
- Technical terms, proper nouns: often 2-3 tokens
- Code: typically more tokens per character (special syntax)
- Non-English text: often 2-4x more tokens than equivalent English
Counting Tokens Programmatically
# Count tokens before making API calls
import anthropic
import tiktoken
# For Claude (Anthropic provides a count_tokens endpoint)
client = anthropic.Anthropic()
def count_claude_tokens(messages: list, system: str = "") -> dict:
"""Count tokens for a Claude request before making it."""
response = client.messages.count_tokens(
model="claude-sonnet-4-5",
system=system,
messages=messages
)
return {
"input_tokens": response.input_tokens,
"estimated_cost_input": response.input_tokens * 3.00 / 1_000_000
}
# Example
tokens = count_claude_tokens(
messages=[{"role": "user", "content": "Summarize this document: " + long_document}],
system="You are a helpful assistant."
)
print(f"Input tokens: {tokens['input_tokens']}")
print(f"Input cost: ${tokens['estimated_cost_input']:.4f}")
# For OpenAI models, use tiktoken
import tiktoken
def count_openai_tokens(text: str, model: str = "gpt-4o") -> int:
"""Count tokens for an OpenAI request."""
encoding = tiktoken.encoding_for_model(model)
return len(encoding.encode(text))
# Count tokens for a full message array
def count_messages_tokens(messages: list, model: str = "gpt-4o") -> int:
encoding = tiktoken.encoding_for_model(model)
tokens = 0
for message in messages:
tokens += 4 # Every message has a 4-token overhead
for key, value in message.items():
tokens += len(encoding.encode(str(value)))
tokens += 2 # Reply overhead
return tokens
# Example
document_tokens = count_openai_tokens(my_document, "gpt-4o")
print(f"Document tokens: {document_tokens}")
print(f"Input cost: ${document_tokens * 2.50 / 1_000_000:.4f}")
# Rough estimate without API call (works for any model)
def estimate_tokens(text: str) -> int:
"""Rough estimation: 1 token ≈ 4 characters for English."""
return len(text) // 4
# More accurate: count words
def estimate_tokens_by_words(text: str) -> int:
"""1.3 tokens per word is a reasonable estimate."""
return int(len(text.split()) * 1.3)
Step 2: Estimate Output Tokens
Output tokens are harder to predict. Strategies:
1. Set and measure: Run your prompt 50-100 times in development, measure average output tokens.
results = []
for _ in range(50):
response = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=2048,
messages=[{"role": "user", "content": sample_prompt}]
)
results.append(response.usage.output_tokens)
import statistics
print(f"Mean output tokens: {statistics.mean(results):.0f}")
print(f"P95 output tokens: {sorted(results)[int(len(results)*0.95)]:.0f}")
print(f"Max output tokens: {max(results)}")
2. Output type heuristics:
| Output Type | Typical Token Count |
| Yes/No answer | 1-5 |
| Single-sentence answer | 15-30 |
| Short paragraph | 80-150 |
| Bullet list (5 items) | 100-200 |
| Code function (~20 lines) | 200-400 |
| Short essay (500 words) | 700-800 |
| Code file (~100 lines) | 1,000-2,000 |
| Long-form report | 2,000-5,000 |
3. Set max_tokens strategically:
Don't set max_tokens to the model's maximum. Set it to 1.5× your expected max output. You don't pay for tokens you don't generate.
Step 3: The Full Cost Calculation
from dataclasses import dataclass
from typing import Optional
@dataclass
class ModelPricing:
name: str
input_per_1m: float
output_per_1m: float
cached_input_per_1m: Optional[float] = None
PRICING = {
"claude-sonnet-4-5": ModelPricing("Claude Sonnet 4", 3.00, 15.00, 0.30),
"claude-haiku-3-5": ModelPricing("Claude Haiku 3.5", 0.80, 4.00, 0.08),
"gpt-4o": ModelPricing("GPT-4o", 2.50, 10.00, 1.25),
"gpt-4o-mini": ModelPricing("GPT-4o Mini", 0.15, 0.60, 0.075),
"gemini-2.5-pro": ModelPricing("Gemini 2.5 Pro", 1.25, 10.00, None),
"gemini-2.0-flash": ModelPricing("Gemini Flash", 0.10, 0.40, None),
"deepseek-r1": ModelPricing("DeepSeek R1", 0.55, 2.19, None),
"llama-3.3-70b-groq": ModelPricing("Llama 3.3 70B (Groq)", 0.59, 0.79, None),
}
def calculate_cost(
model: str,
input_tokens: int,
output_tokens: int,
cached_tokens: int = 0
) -> dict:
pricing = PRICING[model]
uncached_input = input_tokens - cached_tokens
input_cost = uncached_input * pricing.input_per_1m / 1_000_000
output_cost = output_tokens * pricing.output_per_1m / 1_000_000
cache_cost = 0
if cached_tokens > 0 and pricing.cached_input_per_1m:
cache_cost = cached_tokens * pricing.cached_input_per_1m / 1_000_000
total = input_cost + output_cost + cache_cost
return {
"model": pricing.name,
"input_cost": input_cost,
"output_cost": output_cost,
"cache_cost": cache_cost,
"total": total,
"formatted": f"${total:.6f}"
}
# Example: Document summarization
cost = calculate_cost(
model="claude-sonnet-4-5",
input_tokens=5000, # 5K input (a medium document)
output_tokens=500, # 500 output (summary)
cached_tokens=1000 # 1K cached system prompt tokens
)
print(f"Per request: {cost['formatted']}")
# Per request: $0.022500
Step 4: Estimate Production Costs From Prototype Usage
The hardest part: your prototype with 10 test cases doesn't tell you what 100K real users will cost.
The Prototype-to-Production Formula
def project_production_cost(
prototype_cost_per_call: float,
daily_active_users: int,
calls_per_user_per_day: float,
overhead_multiplier: float = 1.4 # Account for retries, dev calls, etc.
) -> dict:
"""
Project production costs from prototype data.
overhead_multiplier: production typically runs 30-50% more calls than you expect
"""
daily_calls = daily_active_users * calls_per_user_per_day
daily_cost = daily_calls * prototype_cost_per_call * overhead_multiplier
return {
"daily_calls": int(daily_calls),
"daily_cost": f"${daily_cost:.2f}",
"monthly_cost": f"${daily_cost * 30:.2f}",
"annual_cost": f"${daily_cost * 365:.2f}",
"cost_per_user_per_month": f"${daily_cost * 30 / daily_active_users:.2f}"
}
# Example: AI writing assistant
prototype_avg_cost = 0.045 # $0.045 per request (measured in dev)
costs = project_production_cost(
prototype_cost_per_call=prototype_avg_cost,
daily_active_users=1000,
calls_per_user_per_day=5.0,
overhead_multiplier=1.3
)
print(f"Daily calls: {costs['daily_calls']:,}")
print(f"Daily cost: {costs['daily_cost']}")
print(f"Monthly cost: {costs['monthly_cost']}")
print(f"Cost per active user/month: {costs['cost_per_user_per_month']}")
# Daily calls: 5,000
# Daily cost: $292.50
# Monthly cost: $8,775
# Cost per active user/month: $8.78
At $8.78/user/month AI cost, you need to charge users at least $15-20/month to maintain reasonable margins.
Step 5: Build a Cost Estimator for Your Specific Workload
def estimate_workload_cost(
# Input description
avg_system_prompt_tokens: int,
avg_user_message_tokens: int,
avg_conversation_turns: int, # Messages before task completion
pct_system_prompt_cached: float, # 0.0-1.0
# Output description
avg_output_tokens_per_turn: int,
# Scale
requests_per_day: int,
# Model
model: str = "claude-sonnet-4-5"
) -> dict:
pricing = PRICING[model]
# Per single request (one turn)
uncached_system = avg_system_prompt_tokens * (1 - pct_system_prompt_cached)
cached_system = avg_system_prompt_tokens * pct_system_prompt_cached
input_tokens = uncached_system + avg_user_message_tokens
cached_input_tokens = cached_system
output_tokens = avg_output_tokens_per_turn
# For multi-turn: accumulate conversation in context
# Each turn adds previous turn's tokens to input
if avg_conversation_turns > 1:
extra_context_per_turn = (avg_user_message_tokens + avg_output_tokens_per_turn) * (avg_conversation_turns - 1) / 2
input_tokens += extra_context_per_turn
cost_per_request = calculate_cost(
model, int(input_tokens), int(output_tokens * avg_conversation_turns),
int(cached_input_tokens)
)["total"]
daily_cost = cost_per_request * requests_per_day
return {
"tokens_per_request": {
"input": int(input_tokens),
"cached": int(cached_input_tokens),
"output": int(output_tokens * avg_conversation_turns)
},
"cost_per_request": f"${cost_per_request:.4f}",
"daily_cost": f"${daily_cost:.2f}",
"monthly_cost": f"${daily_cost * 30:,.2f}"
}
# Example: Customer support chatbot
estimate = estimate_workload_cost(
avg_system_prompt_tokens=2000, # Company knowledge base in system prompt
avg_user_message_tokens=150, # Short customer questions
avg_conversation_turns=3, # 3 exchanges per ticket
pct_system_prompt_cached=0.95, # 95% cache hit rate on system prompt
avg_output_tokens_per_turn=250, # Moderate response length
requests_per_day=2000, # 2K tickets/day
model="claude-sonnet-4-5"
)
print("Customer Support Bot Estimate:")
print(f"Tokens per request: {estimate['tokens_per_request']}")
print(f"Cost per ticket: {estimate['cost_per_request']}")
print(f"Daily cost: {estimate['daily_cost']}")
print(f"Monthly cost: {estimate['monthly_cost']}")
The Prompt Caching Impact
This is the largest cost lever many teams miss:
# Same workload, with vs without prompt caching
without_cache = calculate_cost(
"claude-sonnet-4-5",
input_tokens=5000,
output_tokens=500,
cached_tokens=0
)
with_cache = calculate_cost(
"claude-sonnet-4-5",
input_tokens=5000,
output_tokens=500,
cached_tokens=3000 # 3K of 5K input is cached system prompt
)
print(f"Without caching: {without_cache['formatted']}")
print(f"With caching: {with_cache['formatted']}")
print(f"Savings: {(1 - with_cache['total']/without_cache['total'])*100:.0f}%")
# Without caching: $0.022500
# With caching: $0.015900
# Savings: 29%
# At 100K requests/month:
monthly_savings = (without_cache['total'] - with_cache['total']) * 100_000
print(f"Monthly savings at 100K req/month: ${monthly_savings:,.2f}")
# Monthly savings: $660.00
When to Switch Providers
def cost_comparison_table(
input_tokens: int,
output_tokens: int,
requests_per_month: int
):
"""Compare monthly costs across all major providers."""
results = []
for model_key, pricing in PRICING.items():
cost_per_req = calculate_cost(model_key, input_tokens, output_tokens)["total"]
monthly_cost = cost_per_req * requests_per_month
results.append({
"model": pricing.name,
"per_request": f"${cost_per_req:.4f}",
"monthly": f"${monthly_cost:,.2f}"
})
results.sort(key=lambda x: float(x["monthly"].replace("$", "").replace(",", "")))
print(f"\nCost comparison: {input_tokens} input + {output_tokens} output, {requests_per_month:,} req/month")
print(f"{'Model':<25} {'Per Request':<15} {'Monthly'}")
print("-" * 55)
for r in results:
print(f"{r['model']:<25} {r['per_request']:<15} {r['monthly']}")
cost_comparison_table(2000, 500, 100_000)
Output:
Cost comparison: 2000 input + 500 output, 100,000 req/month
Model Per Request Monthly
-------------------------------------------------------
Gemini Flash $0.0004 $40.00
GPT-4o Mini $0.0006 $60.00
Llama 3.3 70B (Groq) $0.0016 $160.00
DeepSeek R1 $0.0021 $210.00
Gemini 2.5 Pro $0.0075 $750.00
GPT-4o $0.0100 $1,000.00
Claude Haiku 3.5 $0.0036 $360.00
Claude Sonnet 4 $0.0135 $1,350.00
This makes the decision obvious: for a task where Gemini Flash or GPT-4o Mini is good enough, using Claude Sonnet 4 costs 33x more per month.
The Bottom Line
Cost estimation before building requires:
- Count input tokens: Use the provider's token counter or tiktoken on sample prompts
- Measure output tokens: Run 50+ test cases, record P50 and P95 output lengths
- Apply the formula:
cost = (input × input_price) + (output × output_price) - (cached × cache_savings) - Project to scale:
monthly_cost = cost_per_request × requests_per_day × 30 × 1.4 - Check unit economics:
cost_per_user = monthly_cost / MAU— must be well below your monetization - Model the caching benefit: Large, reused system prompts can cut costs 30-60%
Verify current pricing at llmversus.com/calculator — prices change frequently enough that hardcoded numbers go stale fast.