The State Of AI In 100 Trillion Tokens: What OpenRouter’s Data Really Says About How We Use LLMs

If you’ve ever wondered what people actually do with large language models all day, OpenRouter’s new “State of AI” report just dropped a data nuke: 100 trillion tokens of real-world usage across hundreds of models. This isn’t vibes or benchmarks—it’s how developers, gamers, writers, and enterprises are really abusing the context window in 2025.

What Is the OpenRouter “State of AI” Study?

The report is an empirical analysis of over 100 trillion tokens routed through OpenRouter’s multi-model platform, spanning more than 300 models and 60+ providers over roughly two years. It tracks what tasks people run, which models they pick, how usage shifts over time, and how much of the traffic is now “agentic”—multi-step, tool-using, reasoning-heavy workflows rather than single-shot replies.

Instead of reading raw prompts, OpenRouter works off anonymized metadata and category tags from a classifier that maps prompts into topics like Programming, Roleplay, Technology, Translation, Health, and more. That means the report can’t “spy” on conversations, but it can see the big-picture patterns of how nerds, devs, and companies are using models in the wild.

Top Findings Nerds Should Care About

Buried in the charts is a surprisingly geeky reality: a huge slice of LLM usage is not people writing emails—it’s coding, roleplaying, and running full-on AI agents. If you live inside IDEs, Discord servers, or TTRPG campaigns, this report is basically analytics for your lifestyle.

Reasoning models are now the default. In 2023, most traffic was simple text-completion; by late 2025, reasoning-oriented models (like o1-style and later-gen GPT/Claude/Gemini variants) handle over half of all tokens. The “multi-step chain-of-thought, tools, and planning” style is no longer a niche—it’s the main way people use AI.
Programming has gone absolutely vertical. The share of programming-related usage has grown from around 10–11% in early 2025 to more than 50% of total token volume in recent weeks. LLMs have become a core part of the dev toolchain: debugging, codegen, refactors, scripting, and system design.
Roleplay is a monster category. For open-source models in particular, more than half of usage is creative roleplay and character-driven interaction. Think interactive fiction, OC chats, game-like scenarios, and persona sims. The report even notes that “Roleplay / Games / Writers Resources / Adult” style workloads dominate much of the OSS demand.
Open source vs closed: a 70/30-ish equilibrium. Proprietary models still serve the majority of tokens overall, especially for enterprise and regulated workloads. But open-weight models have surged, with OSS reaching around one-third of usage by late 2025, driven heavily by Chinese-developed models like DeepSeek and Qwen.
Chinese OSS models have leveled up. They started as a rounding error in late 2024, then ramped to nearly 30% of all usage in some weeks within 2025. They’re not just for roleplay anymore—programming and technology workloads are now a majority of their traffic.
Sequences are huge now. Average total sequence length (prompt + completion) has more than tripled over ~20 months, going from under 2,000 tokens to over 5,000. Programming prompts especially are ballooning into 20K-token monsters as people paste in entire services, repos, and logs for the model to reason about.

The Multi-Model Era: No Single “Best” Model

One of the biggest meta-themes is that the LLM universe is fundamentally multi-model. There is no single “winner”; instead, OpenRouter’s traffic shows a shifting ecosystem where different models dominate different niches—reasoning, coding, roleplay, tools, low-cost bulk, etc.

The report highlights a split between:

Premium leaders (e.g., Claude Sonnet 4, Claude 3.7 Sonnet, GPT-5 Pro): higher cost per million tokens, but trusted for high-stakes, high-value workloads.
Efficient giants (e.g., Gemini Flash, DeepSeek V3): lower cost, high volume, “default workhorse” usage for many tasks and long contexts.
Open OSS contenders (Qwen, Mistral, LLaMA, GLM): aggressively priced models that steadily eat into volume with improving quality.
Premium specialists (e.g., GPT‑4, GPT‑5 Pro tiers): ultra-expensive models used sparingly when output quality or reliability dwarfs token cost.

For nerds building apps, that means: assume a multi-model stack. Route coding tasks one way, narrative tasks another, and fall back to frontier models when something really matters. OpenRouter is literally built for this “many models, one API” world via its model catalog.

Roleplay & Creative Use: LLMs as Game Engines

The report’s most delightfully nerdy reveal: roleplay isn’t just a side quest—it’s one of the main quests for open-source models. Most OSS tokens fall into Roleplay / Games / Writers Resources / Adult as primary categories.

Games & TTRPGs. Nearly 60% of “roleplay” usage is tagged under Games/Roleplaying Games. People are using LLMs as DM assistants, NPC engines, branching story generators, and procedural lore machines.
Interactive fiction & OCs. Another chunk belongs to Writers Resources and long-form, character-driven storytelling. LLMs are being used as co-authors, improv partners, and persona simulators.
Why OSS wins here. Open models are often less constrained, more customizable, and cheaper at scale. That’s perfect for fan fiction bots, custom character sims, or long-running worlds where cost and flexibility matter more than corporate safety policies.

If you’re building a narrative-heavy game, a Discord bot, or a browser-based story sandbox, this report basically validates your entire life choices. You’re not “off-label”; you are mainstream usage at scale.

Programming: LLMs as Your New Pair Programmer

On the productivity side, programming is the killer app. The “Programming” category’s share of tokens has been steadily climbing and now exceeds half of total usage across all models. That’s enormous.

Code workflows dominate long-context usage. Programming prompts are often 3–4x longer than general-purpose prompts, dragging along entire files, stack traces, configs, and logs for the model to chew on.
Anthropic leads, but competition is fierce. The report notes that Anthropic’s Claude models hold the largest share of programming spend, but OpenAI, Google, and OSS providers like Z-AI, Qwen, and Mistral are rapidly gaining share as coding quality and speed improve.
Developers are price-aware, but not price-enslaved. There’s only weak correlation between price-per-million tokens and usage. Devs are clearly willing to pay more for models that “just solve the bug” or “just refactor the service” reliably.

Practically, this means your dev stack may mix: a cheaper high-context model for browsing codebases, a mid-priced reasoning model for logic, and a top-tier model for hairy, business-critical tasks. The report frames this not as future potential, but as what’s already happening.

Agentic Inference: From Chatbot to Workflow Engine

Another huge shift: LLM usage is moving from one-off Q&A to agentic inference—multi-step, tool-using, stateful workflows where the model behaves less like a “smart autocomplete” and more like an orchestrator.

Reasoning traffic exploded. Reasoning-optimized models went from negligible share in early 2025 to the majority of tokens later in the year. These models are tuned for stepwise thinking, planning, and internal “scratchpad” computation.
Tool calls are rising steadily. The share of requests that actually invoke tools has climbed over time, spreading from a small cluster of models (like gpt‑4o‑mini, early Claude 3.x) to a broader ecosystem that supports function calling, web search, code execution, and more.
Sequence length as an “agent” proxy. Long sequences (big prompts, multi-turn conversations) map well to agent-like patterns: a model iterating on the same task, calling tools, refining output, and tracking context over many steps.

If you’re building bots that run tools, call APIs, or manage workflows, this report is basically saying: you are surfing the main trend, not a side experiment. The center of gravity is shifting toward “LLM as process engine,” not just “LLM as chatbot.”

Global Adoption & Language Trends

The State of AI report is also a world map disguised as charts. North America is still the biggest region by spend, but no longer an absolute majority for much of the time window. Europe is stable, and Asia has ramped up dramatically.

Regional spend. Roughly speaking, North America sits under 50% of spend; Asia rises toward ~30%; Europe hovers around the low 20s; other regions share a small but non-zero tail.
Language usage. English absolutely dominates at over 80% of tokens, but Simplified Chinese, Russian, Spanish, Thai, and others make up a meaningful long tail. The growth of Chinese OSS models correlates with significant Chinese-language usage.

All of this reinforces one design principle: if you’re building AI products, design them for global nerds, not just English-speaking ones. Multilingual support, region-aware deployment, and compliance aren’t a “later” problem anymore.

Retention, the Glass Slipper Effect, and the Boomerang

One of the most fascinating ideas in the report is the “Glass Slipper” effect: certain model launches create foundational cohorts of users whose workloads finally “fit” a model so well that they just never leave.

Foundational cohorts. When a new model nails a previously unsolved workload (e.g., better reasoning, more stable tools, better price/performance), the cohort that adopts it early shows unusually high, long-term retention.
Launch windows matter. GPT‑4o mini is a prime example: it captured a huge, sticky workload at launch, and later cohorts never replicated that level of retention. If you miss that initial “frontier” perception, you may never form a foundational cohort.
Boomerang effect. Some DeepSeek cohorts actually show users returning after churn. That suggests devs test alternatives, then come back when they realize DeepSeek still has the best fit for their weird, high-intensity workloads.

For product-minded geeks, this is a power concept: retention is not just about UI or pricing—it’s about being the first model that truly solves a hard workload. Once that happens, switching away becomes costly in time, infra, and brainspace.

Why Nerds Should Actually Care

The State of AI report isn’t just market theater—it’s a roadmap for where to build, what to build, and which parts of the ecosystem are still under-served.

Roleplay & gaming are under-monetized. Roleplay is massive in volume, but still feels early in terms of polished products—especially around persistent worlds, shared campaigns, and mod-friendly tools. If you’re a game dev or system designer, there’s a lot of white space here.
Programming tools are the new battleground. With programming >50% of usage, “AI-native devtools” (custom IDEs, code browsers, refactor bots) are basically the default future of software development.
Agent frameworks will eat workflows. As agentic inference grows, orchestrators, schedulers, and debugging tools for AI agents will be critical. Think logging, tracing, and “DevTools for agents.”
Specialized, cheaper models can still win. The cost–usage scatterplots show that you don’t have to be the most expensive or the biggest to matter. A well-positioned mid-sized model can dominate a niche if it pairs good-enough quality with great routing and integrations.

How to Experiment With These Trends Yourself

If you want to play with the same ecosystem this report analyzes, the easiest way is to use OpenRouter directly. Their homepage and models page let you inspect different models, prices, and capabilities from one place.

Try wiring multiple models into your app via a single OpenRouter API key, routing roleplay to an OSS model and code to a reasoning-focused model.
Use cheaper, long-context models for exploration and then escalate final “decision” calls to a premium reasoning model.
Prototype an agent that uses tools, external APIs, and long history—your behavior will literally become part of the next generation of charts.

FAQs

Is this report about one model or many?
It covers hundreds of models across both closed and open providers, all routed via OpenRouter. The whole point is to see ecosystem-wide behavior, not crown one model as king.
Does OpenRouter see my prompts?
The analysis is based on anonymized metadata and category labels from a classifier applied to a small random sample. The study explicitly doesn’t rely on reading prompt or completion text at scale.
Is open source “winning”?
Open models aren’t dominating overall, but they’ve grown to around a third of usage and absolutely dominate certain use cases like roleplay and some coding scenarios, especially where cost and control matter.
What’s the biggest surprise?
Two big ones: roleplay is enormous, and programming has quietly become the main heavy-duty workload. Together, they explain a huge chunk of token volume.
Where can I read the full report?
You can read the full “State of AI: An Empirical 100 Trillion Token Study with OpenRouter” directly at openrouter.ai/state-of-ai.

Join the Conversation

If you’re building something inspired by this report—an AI DM, a code-native agent, a weird OSS-driven workflow, or a multi-model router—drop your ideas in the comments, share this post with your favorite dev/gamer friends, or bookmark it for your next architecture debate. The next “glass slipper” workload could easily come from a side project you’re hacking on at 2 a.m.

Want more deep dives into AI usage, tools, and nerdy infrastructure trends? Stick around, share this with your crew, and keep experimenting—those 100 trillion tokens are just the prologue.

The State of AI in 100 Trillion Tokens: What OpenRouter’s Data Really Says About How We Use LLMs