The Headcount Delusion

You can’t convert developers to tokens, and anyone trying is selling you something

There’s a new kind of pitch making its way through boardrooms, and it goes something like this: stop thinking about headcount and start thinking about token volume.

One developer equals X billion tokens per month. Replace the developer, buy the tokens. The math is clean, the spreadsheet looks great, and the decision almost makes itself.

It’s also completely wrong. Not wrong in the way that reasonable people can agree to disagree. Wrong in the way that reveals you don’t understand what software developers do, or how token costs actually work, or both.

Press enter or click to view image in full size

The False Equivalence of Developer Tokens (Image Assist from ChatGPT)

The token equivalence problem

Jensen Huang appeared on a popular podcast last month and told everyone they should be burning at least $250,000 in tokens per year, or they’re not doing real work. That’s not a productivity benchmark. That’s a GPU sales target dressed up as career advice. And it’s exactly the kind of thinking that leads to the headcount delusion — the idea that you can just convert a developer to a billion tokens a month and call it a replacement.

You can’t. Developers aren’t interchangeable token consumers. Software development is like medicine — there are specialists, focus areas, and different kinds of work that require different approaches. A frontend engineer optimizing render performance doesn’t have the same token needs as a backend engineer designing a distributed system. A DevOps person automating deployments doesn’t use models the same way someone architecting a data pipeline does. For anyone to show up and say “one developer equals X tokens per month” reveals they don’t understand the work. That’s 100% not how it works.

Token Consumption Varies with the Task

I’ll use my own token consumption as an example. I might burn through a billion tokens in a week if I’m focused on an intense rewrite or a rearchitecture across several projects. These are the projects that often demand parallel, agentic development approaches. I might be asking a high-priced frontier model to analyze years' worth of data, conduct research, and design a new UI experience.

Next week, I might not be working on a task that requires large-scale data analysis and aggregation. I’ll still use the same tools, but I’ll dial them down to models better suited to the task at hand. I’ll still be using agents, but I might be bumping 80% of the work down to a lower-cost model and driving testing or quality work.

The larger point I’m trying to make here is that I might go from consuming a billion tokens one week to 200M the next or less. This isn’t an area where you ever want to measure dollar spend as a KPI for a programmer. If you are doing that, you are measuring the wrong thing (and many companies are doing this)

The people pushing this framing talk about token volume like it’s a substitute for labor. It’s not. Tokens are what an AI model consumes when you ask it a question. A developer is a person who knows which questions to ask, when to ask them, and what to do with the answers. Conflating the two isn’t an oversimplification. It’s a category error.

The token cost problem

Even if you accepted the framing — and you shouldn’t — the math still doesn’t work, because token costs aren’t fixed. They vary wildly depending on which model you’re using. A prompt to Haiku costs a fraction of a cent. The same prompt to Fable 5 costs orders of magnitude more. And the models being pitched to executives as replacements for expensive developers are always the most expensive ones — the frontier models, the ones that supposedly can do what a senior engineer does.

Those are the models with the thinnest subsidy and the highest real cost. The $200/month subscription price doesn’t reflect what the inference actually costs. It reflects what venture capital allows the company to charge. When the subsidy ends, the real cost shows up. The token volume you budgeted at subsidized prices suddenly costs three, five, or ten times as much as you planned.

And token costs aren’t going to zero. The cheap models are getting cheaper, yes. But the frontier — the model you were told could replace your senior developer — is always going to be expensive. That’s the nature of the frontier. Today’s frontier becomes tomorrow’s commodity, but the price of being on the cutting edge stays high.

The ecosystem blind spot

There’s another problem. Most companies I talk to are using one provider. Claude only, or ChatGPT only. They’ve picked a lane, and they stay in it. And their ideas about token volume and cost are completely shaped by that one ecosystem.

When you only use Claude — Opus, Sonnet, Fable, Haiku — you’re living inside Anthropic’s reality of capabilities and costs. Move over to OpenAI, and one thing immediately shocks you: the token efficiency is different. I’ve found that GPT-5.4 and 5.5 burn roughly half as many tokens as Opus for comparable work. That’s not a small difference. That’s your token budget doubling or halving, depending on which provider you default to.

As the models evolve, you see Anthropic putting more emphasis on token efficiency. You see OpenAI optimizing differently. And then there are the newer models emerging — models from providers most companies haven’t even tried. My biggest fear is that not enough people are experimenting with them. If your entire cost model is built on one provider’s pricing, you’re not making decisions based on the market. You’re making decisions based on habit.

The people who push this

The people telling you to think in tokens instead of headcount fall into two categories: people who don’t know what software developers do, and people who are selling you something. Sometimes both.

If you’ve never built software, the idea that a developer’s output can be measured in tokens sounds reasonable. You type a prompt, you get code, the code works — what’s the difference? The difference is everything that happens before and after the prompt. Understanding the existing system. Knowing what not to change. Recognizing when the model’s output looks right but is subtly wrong in a way that will cause problems three months from now. That’s not token volume. That’s expertise.

And the people selling you something — the tool companies, the model providers, the consultants who get paid by the transformation — they have every incentive to make the comparison look simple. Simple comparisons lead to quick decisions. Quick decisions lead to signed contracts. The complexity shows up later, when the tokens are spent and the team is gone.

The restructuring you can’t undo

Here’s what makes this dangerous: the staffing decisions are being made now, based on a comparison that doesn’t hold. Companies are restructuring based on the idea that token volume can replace headcount. When the real costs show up — and they will — the people are already gone.

You can cancel a subscription. You can’t un-layoff a team. The institutional knowledge, the domain expertise, the relationships between people who’ve worked together for years — that doesn’t come back because you realized the token bill was higher than you expected.

If someone tells you to convert your developers to token volume, ask them which model they’re pricing those tokens on. Ask them whether that price reflects the real cost of inference or a VC subsidy. Ask them what happens to the math when the subsidy ends. And ask them who, exactly, will know which prompts to send when the people who understand the system are working somewhere else.

And, if the answer is, “we can just ask the model how to fix the system,” you are talking to someone who is about to understand the Headcount Delusion.

Over in the Slop Codex, both the Warlock of Staff Reduction and the Solo Sovereign turn up in the Executive NPC chapter of Volume 2, and the Headcount Delusion is essentially their shared scripture. The Warlock wields token math like a scythe to justify the reorg he was already planning. The Solo Sovereign cites it as proof that headcount was always the bottleneck. Neither of them is interested in the part where the tokens run out and the people are already gone.