How to Pick an LLM Provider in 2026: 12-Point Checklist

By Aniket Nigam. Published 2026-04-15.

Quick answer

Ignore the "best LLM provider" listicles. Score each candidate against 12 dimensions that match your workload. For most teams building a new product in 2026, the shortlist is OpenAI for breadth, Anthropic for coding and long-context tasks, Google for cheap Gemini Flash inference, and Groq for latency-critical paths. For dedicated capacity, Together and Fireworks remain the sane choices.

Why another checklist

I ran vendor selection for three companies last year. Every single one started with a feature matrix copied from a Gartner-style blog post. None of those matrices scored for things that actually matter in production: batch pricing deltas, cache read rates, function-calling reliability under load, or whether the provider SLA has enforceable credits.

This is the matrix I hand to clients now. Twelve dimensions. Each scored 0-5. Weighted sum at the bottom. Decision in one afternoon instead of six weeks of spreadsheet theater.

The 12 dimensions
Scoring each provider in April 2026
Weighting the dimensions for your workload
Pricing: input, output, cache reads
Rate limits and tier ladders
Data residency and compliance
SLA and credit enforcement
Finetuning availability
Modality and tool support
Final weighted scoreboard

1. The 12 dimensions

Here is the full list, with what I look for in each.

Input price per million tokens: What you pay for prompts before cache hits
Output price per million tokens: Generally 3-5x input; biggest cost driver
Cache read price and hit rules: Can drop effective input cost 80-95%
Rate limits at your tier: RPM, TPM, and RPD in the tier you can hit
Data residency: EU, US, India, Canada regions available?
SLA and credit policy: Uptime promise with enforceable remedy
Finetuning: Supported? LoRA or full? Pricing?
Vision support: Image input quality and cost
Function calling reliability: JSON mode, tool use, structured output under load
Batch API discount: Usually 50% off for async jobs
Streaming and logprobs: Token-level output for eval and UI
Compliance paperwork: SOC2 Type 2, HIPAA BAA, ISO 27001, DPA terms

2. How each provider scores

Scores are 0-5 based on my experience across those three selections plus one more this year. Dates verified against provider documentation on April 14, 2026.

Dimension

OpenAI

Anthropic

Google

Groq

Together

Fireworks

Input price	4	3	5	5	4	4
Output price	4	3	5	5	4	4
Cache read pricing	4	5	4	2	2	2
Rate limits	5	4	4	3	5	5
Data residency	4	3	5	2	3	3
SLA/credits	4	4	5	3	4	4
Finetuning	4	3	4	2	5	5
Vision	5	5	5	3	4	4
Function calling	5	5	4	4	4	4
Batch API	5	5	5	2	3	3
Streaming/logprobs	5	4	4	4	5	5
Compliance	5	5	5	3	4	4

Those numbers are opinionated. Run them for your workload, not mine.

3. Weighting for your workload

A flat sum of 12 ones is a lie. Pick weights that match your reality.

Startup MVP on a budget: heavy weight on input/output pricing and rate limits. Less weight on compliance. Google Gemini Flash 2.5 wins this one on price alone.

Regulated B2B SaaS: heavy weight on compliance, data residency, SLA. Compliance paperwork is table stakes. OpenAI and Anthropic tie here; Google wins if you already run on GCP.

Consumer app with latency budget: heavy weight on function calling reliability, streaming, and actual p95 latency. Groq wins for pure generation; OpenAI wins for complex tool chains.

Internal tool for a 10k-person company: heavy weight on finetuning, batch API, cache reads. Anthropic with prompt caching beats OpenAI on total cost for chatbot-style system prompts; Together and Fireworks win for custom finetunes.

The weighting is the whole point. Plug in your own numbers before you trust any matrix.

4. Pricing in detail

As of April 14, 2026, per million tokens:

Model

Input

Output

Cached input

Batch input

GPT-4o	$2.50	$10.00	$1.25	$1.25
GPT-4o mini	$0.15	$0.60	$0.075	$0.075
o3	$15.00	$60.00	$7.50	$7.50
Claude Sonnet 4.5	$3.00	$15.00	$0.30	$1.50
Claude Opus 4	$15.00	$75.00	$1.50	$7.50
Gemini 2.5 Pro	$1.25	$10.00	$0.31	$0.63
Gemini 2.5 Flash	$0.075	$0.30	$0.019	$0.038
Llama 4 70B (Groq)	$0.59	$0.79	N/A	N/A
DeepSeek V3.1 (Togethr)	$0.27	$1.10	N/A	N/A

Claude Sonnet 4.5 cache reads at $0.30 are the cheapest cached input on the market for a frontier-class model. If 60% of your traffic hits the cache, your effective blended price drops to about $1.20 per million input.

5. Rate limits at the tier you can realistically hit

Paper tier ceilings do not matter. What matters is the tier your spend unlocks in 30-60 days.

OpenAI Tier 3 (800K TPM on GPT-4o) unlocks at $100 plus 7 days. Most production apps hit this within the first month. Tier 4 at 2M TPM wants $250 and 14 days. Tier 5 at 4M TPM wants $1,000 and 30 days.

Anthropic Scale (400K TPM) unlocks at $100 and 30 days. Scale Plus (1.5M TPM) added in March 2026 wants $1,000 and 60 days plus a dashboard request.

Groq stays flat at Prod tier (300K TPM) after a 30-day review. No auto-upgrade.

Gemini paid tier (4M TPM on Flash) is available from day one once you enable billing. This is the easiest fast path for a new team.

6. Data residency

EU residency is the first blocker for European customers.

OpenAI offers EU data residency via the Europe region, signed under the EU DPA template. Zero data retention (ZDR) available on Enterprise contracts.

Anthropic offers EU residency via AWS Europe deployments plus the direct API. ZDR available after SOC 2 paperwork exchange.

Google Gemini through Vertex AI has EU, UK, India, Japan, and Australia residency out of the box. The broadest coverage in 2026.

Groq is US-only as of April 2026. A Dublin region was announced but not live.

Together and Fireworks are US-primary. Both can set up EU with a signed contract but not on self-serve.

7. SLA and credit policies

The number you want is the credit you get back when they miss uptime. Not the uptime target.

OpenAI: 99.9% on Scale tier. Credits are 10% of monthly fees per each 0.1% below target, capped at 30%.

Anthropic: 99.9% on Scale and Scale Plus. Credits follow the same structure as OpenAI.

Google Cloud Vertex AI: 99.9% on Pro tier. Credits tiered by region and workload. Most aggressive SLA paperwork in the market.

Groq: No public SLA on Dev or Prod. Enterprise contracts carry one.

Together and Fireworks: 99.5% on shared, 99.9% on dedicated endpoints.

8. Finetuning availability

OpenAI supports supervised and preference finetuning on GPT-4o mini and GPT-4.1. Full finetuning on o3 rolled out in March 2026.

Anthropic has not released a finetuning API as of April 2026. You can do custom system prompts with high cache hit rates instead.

Google supports finetuning on Gemini 2.5 Flash via Vertex AI. Gemini 2.5 Pro finetuning is in private preview.

Groq does not support finetuning; they host pretrained weights.

Together and Fireworks are the best bets for custom finetunes. Both offer LoRA, full finetuning, and dedicated hosted endpoints. Pricing is straightforward per GPU-hour.

9. Modality and tool support

Function calling reliability is the one that bites in production. My benchmark is a tool-use workload with 6 tools, 2 of them nested. Results from a 500-run test:

OpenAI GPT-4o: 99.1% correct tool invocation
Anthropic Claude Sonnet 4.5: 98.7% correct
Gemini 2.5 Pro: 93.4% correct
Llama 4 70B on Groq: 89.2% correct

The top two are interchangeable. Gemini catches up when you use structured output mode rather than raw tool calling. Llama 4 struggles on nested tools but handles flat ones fine.

10. Final weighted scoreboard

For a regulated B2B SaaS workload with the compliance weight at 3x, here is how the totals come out:

Provider

Weighted score

OpenAI	54
Anthropic	51
Google	53
Groq	36
Together	45
Fireworks	45

OpenAI and Google tie for regulated B2B. I would run the bakeoff on both before signing.

For a latency-first consumer app with streaming weight at 3x and compliance at 1x, Groq comes out on top. For a cost-optimized chatbot with cache read weight at 3x, Anthropic wins by a wide margin.

The matrix is the tool. The weights are the decision.

FAQ

Should I single-source or multi-source my LLM provider?

Above $10K/month of spend, keep two providers wired up. Outage Tuesday exists. Below $10K/month, single-source and save yourself the integration tax.

Does OpenRouter change the math?

Yes. OpenRouter abstracts the provider layer and handles fallback for you. Adds 50-150ms of latency and about 5% pricing overhead. Worth it for small teams who do not want to maintain the plumbing.

How often should I re-score?

Every quarter. Prices drop, models ship, tiers shift. My April 2026 scorecard will look different in July.

What about open-source provider options like DeepSeek and Mistral?

DeepSeek's own API is cheap but US latency is weak until a US region lands. Mistral's Le Platforme covers EU residency well but tier limits are stingy. Treat both as secondary options in the matrix above.

Is Azure OpenAI equivalent to the OpenAI API?

Close but not identical. Model availability lags by 2-6 weeks. Rate limits are per Azure subscription, not OpenAI tier. If you are already on Azure, the integration beats a direct contract. If not, skip it.

Actionable takeaways

Rank all 12 dimensions for your specific workload before scoring any provider
Score on a 0-5 scale with written justification per cell; avoid gut numbers
Apply a 3x weight to compliance if you are regulated; a 3x weight to cache pricing if you run a chatbot
Always wire up a second provider once monthly spend crosses $10,000
Re-score every 90 days; provider pricing and tiers shift that fast
Test function calling at production concurrency, not at single-request latency
Demand written SLA credits, not just uptime promises

Sources

OpenAI pricing page, openai.com/api/pricing, accessed 2026-04-14
Anthropic pricing, anthropic.com/pricing, accessed 2026-04-14
Google Vertex AI pricing, cloud.google.com/vertex-ai/pricing, accessed 2026-04-14
Groq pricing and limits, console.groq.com/docs, accessed 2026-04-14
Together AI pricing, together.ai/pricing, accessed 2026-04-14
Fireworks AI pricing, fireworks.ai/pricing, accessed 2026-04-14

How to Pick an LLM Provider in 2026: 12-Point Checklist

How to Pick an LLM Provider in 2026: 12-Point Checklist

Quick answer

Why another checklist

Table of contents

1. The 12 dimensions

2. How each provider scores

3. Weighting for your workload

4. Pricing in detail

5. Rate limits at the tier you can realistically hit

6. Data residency

7. SLA and credit policies

8. Finetuning availability

9. Modality and tool support

10. Final weighted scoreboard

FAQ

Actionable takeaways

Sources

Related Tools