The Token Slot Machine: Agentic AI is Engineered to Never Let you Ship

May 24, 2026·5 min read

If you have been using agentic coding tools lately, you have probably felt it.

The tool writes 95% of your feature in under 10 seconds. You are thrilled. You think you are minutes away from shipping.

Then you spend the next 4 hours prompting, correcting, chasing weird edge cases, and watching the agent rewrite the same block of code 12 times. You are no longer coding, you are playing a slot machine.

Agentic harnesses like Claude Code or Codex are brilliant at generating the first 95% of a feature because it is mostly boilerplate: standard API calls, typical UI components, and generic logic. The final 5% is where the actual complexity lives. It is the edge cases, the system integration, the non-functional requirements, and the state management. The agent cannot handle this easily because the search space explodes.

But because you are 95% of the way there, the dopamine hit keeps you hooked.

'Just one more prompt,' you tell yourself. 'The next one will fix the integration.'

It is the exact same psychological mechanism as a slot machine in a Vegas casino. They get you 95% of the way there in 3 seconds, then make you pull the lever forever, convinced you are just one prompt away from shipping.

The leak that confirmed the casino mechanics

If this sounds like a conspiracy theory, look at what happened in early May 2026.

When the source code of a prominent agentic coding harness leaked, the tech world was busy debating prompt injection. The real story was buried deep in the orchestration engine.

The leak revealed separate code pathways for employees of the AI provider and the general public. The public pathway was designed to deliberately nerf the model outputs, forcing you to go through multiple prompt cycles and consume significantly more output tokens to achieve the exact same working build.

Why? Because it forces you to keep prompting.

More prompts mean more tokens. More tokens mean more revenue. It is a highly optimized, algorithmic revenue-generation loop designed to play on your desire to finish. By stretching out the final 5% of the build into an endless game of token-burning iterations, the provider turns a simple 0.05taskintoa15.00 token tab.

You thought you were using a developer tool. In reality, you were standing at a digital craps table designed by behavioral psychologists.

Vibe coding is a P&L cancer

This token-burning mechanic would be bad enough if it were confined to professional software engineers who can at least read the output. It is infinitely worse because companies are handing these tools to everyone.

We call it 'vibe coding' and it has become an operational and financial cancer.

When these tools went mainstream, a lot of organizations took a hands-off approach. They bought enterprise licenses for anyone who asked, hoping to encourage grassroots innovation.

What they actually got was a massive, silent leak in their P&Ls.

People with zero software engineering experience are using AI IDEs to build internal tools, dashboards, and databases. They don't understand the tech stack. They don't understand non-functional requirements like security, scalability, data compliance, and maintainability. They have no idea what the model actually generated. They are just clicking 'accept' on the agent's suggestions, racking up thousands of dollars in token bills, and creating a liability.

We have tech-savvy people scattered throughout every department at Zendesk and across the industry. That is great. But giving people who do not understand software architecture raw access to an AI IDE is like letting someone who played Microsoft Flight Simulator fly a commercial jet. They will crash the system, and the cleanup will cost 10 times more than the code was ever worth.

The playbook for AI architects

To stop the bleeding, you need a strategy that treats AI as a variable infrastructure resource, not a flat-rate SaaS tool. This is where AI architects must take the lead.

Here is the playbook we are using to get this under control.

1. Let architects, not analysts, lead the budget

Your traditional finance analyst is useless here. They are used to budgeting for software seats: $150 per month per user, flat, predictable. AI spend behaves like cloud compute with compounding usage spikes, not a fixed SaaS subscription.

AI architects understand the technical mechanics of token consumption. They understand the difference between a prompt chain that runs 5 times and one that loops indefinitely. They need to be the ones defining the budgets, building the consumption models, and setting the hard financial limits.

2. Implement intelligent model routing

Stop using frontier flagship models for every single query. It is lazy and expensive.

A well-designed routing architecture can direct 80% of your daily prompts to smaller, cheaper, or even locally hosted models. You only route to the expensive flagship when the task requires complex reasoning or high-level architecture. A simple router can cut your token costs by 60% overnight without touching developer productivity.

3. Establish strict access controls

Limit access to AI IDEs. Just because someone wants to prompt does not mean they should.

If an employee cannot explain how their code handles error states, handles data persistence, or secures user inputs, they should not have an AI IDE license. Encourage them to use standard chat tools to brainstorm or analyze data. Save the high-power developer environments for people who actually understand what the system is doing under the hood.

4. Build token efficiency guardrails

You need operational observability on your developer environments. Track token usage per user, per session, and per repository. If a developer is running up a $300 bill on a single feature because they are stuck in a prompt loop, the system should trigger an alert or temporarily pause the session. It forces the developer to step back, look at the code, and solve the problem manually instead of pulling the slot machine lever again.

The AI strategy of the last 2 years was about speed, but the next 2 years must be about efficiency. The companies that realize this first will survive the bill, while the rest keep feeding the slot machine.

Comments (0)

No comments yet. Be the first to share your thoughts!