Aniket Nigam · Author & Founder

Why I built LLMversus

I started LLMversus because I kept wasting hours every week hunting for current token prices. Provider pricing pages move without notice. Benchmark leaderboards live on a dozen different sites. And anyone shipping a production feature with LLMs ends up maintaining their own spreadsheet of inputs, outputs, context windows, and rate limits. LLMversus is the spreadsheet I wanted, rebuilt as a public site.

Background

Before LLMversus I was the solo engineer on PrepAiro, an IB-focused study app that ran on a mix of OpenAI, Anthropic, and smaller open-weight models served through Groq and Together. I integrated my first GPT-3 endpoint in 2021, shipped a retrieval pipeline on GPT-4 in 2023, and burned through roughly $50,000 of my own inference budget across experiments, batch jobs, and a long stretch of agent prototyping. That experience is where the opinions on this site come from. I have felt the sting of a runaway context window and the relief of switching a workload from Claude Opus to Haiku without losing quality.

I also spent three years writing developer tooling in TypeScript and Python before any of this - so when I say a provider has a confusing streaming API or a broken tool-call schema, I have usually shipped around it in production.

How I work

Every comparison on LLMversus is written by me, not generated. I read the model cards on release day, watch the three or four benchmark leaderboards I trust (Chatbot Arena, Artificial Analysis, the official MMLU-Pro and GPQA pages), and run my own small evaluation battery on the categories I publish about - coding, SQL, summarization, and function calling. Where I quote a number, it either comes from a provider page, an official leaderboard, or a test I ran myself. Where numbers shift week to week, I timestamp the page and show the last-verified date on the comparison.

I watch model releases weekly. New weights from Anthropic, OpenAI, Google, Mistral, Meta, Alibaba, and DeepSeek all get the same treatment: pricing pulled from OpenRouter and the provider’s own page, benchmarks pulled from the official source, and a personal note on where I have actually used the model in anger.

What I believe

LLMs are becoming a commodity at the frontier and a price war at the long tail. The interesting questions are no longer “which model is smartest” but “which model survives my production traffic without burning a hole in my bank account.” That is the lens behind every ranking and calculator on this site.

Get in touch

Corrections, benchmark disputes, pricing updates, or a tip about a provider quietly changing their rate card: email me at hello@llmversus.com. I read everything and I fix errors fast. If your company wants to sponsor a benchmark or have a model added, same address.

For the methodology behind the numbers you see on this site, read How LLMversus ranks models.