3 Reasons Your Enterprise AI Bill is Blowing Up

3 Reasons Your Enterprise AI Bill is Blowing Up

May 9, 2026·12 min read

Something happened in enterprise AI pricing over the last 12 months that a lot of companies are still catching up to.

The flat subscription model that made AI tools feel like any other SaaS line item is giving way to usage-based pricing. Token costs are variable. Consumption scales with adoption. And the workflows teams built when everything felt like an all-you-can-eat buffet are suddenly generating invoices that make finance departments physically uncomfortable.

This isn't speculation. The shift is structural. OpenAI, Anthropic, Google, and virtually every major AI provider have moved toward granular, per-token billing, often layered on top of a base subscription fee. Enterprise plans now come with variable usage-based add-ons, credit systems, and tiered model pricing that changes depending on which model you hit, how many tokens you process, and whether you're using batch or real-time inference. Gartner predicts 70% of businesses will prefer usage-based or outcome-based pricing over flat per-seat models by the end of this year. The buffet is over.

The problem is that most companies built their AI adoption strategies during the buffet era. They greenlit workflows, stood up automations, and trained teams on tools when the cost was predictable and fixed. Now the cost is neither. And the skills required to manage variable, consumption-driven technology spending are not the same skills that manage a SaaS renewal calendar.

This pricing shift is exposing three structural problems that have been building quietly inside enterprises. All three were avoidable. None of them are being addressed fast enough.

Problem 1: Nobody knows how to forecast AI costs

I've been saying this for a while, possibly too loudly, but the reaction I keep getting tells me it hasn't landed yet.

Most finance departments do not have the capability to forecast AI costs. They don't have the models. They don't have the tooling. They don't have the roles. They are applying traditional SaaS budgeting logic, annual contracts, per-seat pricing, fixed line items, to a cost structure that behaves like cloud compute: variable, usage-driven, and capable of spiking in ways that nobody anticipated when the budget was set.

The FinOps Foundation has been warning about this for over a year. Their research shows that organizations transitioning to AI-heavy workloads are experiencing the same budget volatility that hit early cloud adopters, except worse, because AI consumption patterns are harder to predict. A single agentic workflow can consume orders of magnitude more tokens than a chatbot interaction, and the difference between a well-optimized prompt chain and a poorly designed one can be a 10x cost multiplier on the same task.

Forrester predicts enterprises will defer 25% of their planned AI spending into 2027 specifically because they can't demonstrate ROI. That's not a technology problem. It's a planning problem. The technology works. The financial governance around it doesn't.

What this means in practice: companies are approving AI projects without understanding what they'll cost to run at scale. They pilot a workflow on a small dataset, the token costs look reasonable, they scale it up, and the bill arrives and it's 5 or 10 times what was projected. Then they kill the project, which means they lose the investment in building it, which means the ROI goes negative, which means the CFO's office gets more skeptical about the next AI proposal, and the cycle continues.

This is a solvable problem but it requires investment in new planning capabilities. Dedicated AI cost forecasting roles. Real-time usage monitoring dashboards. Consumption-based budgeting models that account for variable token costs across different model tiers. Tools that can simulate the cost of a workflow before you deploy it at scale. Most companies have none of this. The ones that build it first will have a massive advantage over the ones that keep budgeting AI like it's another Jira license.

Problem 2: Chasing the shiny thing instead of solving the actual problem

The second issue is strategic, and it's arguably more damaging than the financial one.

A huge number of companies have no clear understanding of what they actually need AI to do for them. They know they need to 'do AI' because the board is asking about it, the CEO read an article, and the competitors mentioned it on their earnings call. So they hire or appoint someone to lead 'AI transformation,' and that person, who often has no background in AI architecture, solution design, or technology economics, does what everyone in that position does: they start chasing the news.

They chase the latest model release. They chase whatever tool got written up in The Verge this week. They chase benchmarks. They deploy the most expensive flagship LLM for every use case because it's 'the best one,' without ever asking whether the use case actually requires a $25-per-million-output-token model or whether a $1 model would do the same job.

80% of AI projects fail to deliver intended business value, according to RAND Corporation analysis. 42% of companies scrapped most of their AI initiatives in 2025, more than double the 17% who did so the previous year, per S&P Global. 90% of AI pilots never reach production. These aren't technology failures. These are strategy failures. Companies deploying AI without a clear connection to specific business outcomes, without understanding which model fits which task, without knowing what 'success' even looks like for the initiative.

The right approach is boring and nobody wants to hear it: start with the business problem. Identify the specific workflow. Define the success metric. Then pick the cheapest, most reliable model that meets the requirement. Not the one topping the benchmarks. Not the one your AI transformation lead saw in a demo last Tuesday. The one that actually fits.

A long-running document analysis pipeline doesn't need GPT-5 Pro or Opus 4.6. It needs a cheap, reliable model that can process large volumes of text consistently over hours without dropping context or hallucinating. That might be a smaller model. It might be a local model running on your own hardware. It might cost 90% less than the flagship you're currently burning tokens on.

But making that decision requires understanding the tradeoffs between speed, cost, reliability, and capability for each specific use case. It requires knowing the difference between a $0.10-per-million-token model and a $25-per-million-token model and when each one is appropriate. Most AI transformation leads don't have that knowledge. They're not expected to have it. And that's the problem.

Problem 3: Vibe coding is generating negative value

This is the one that makes me genuinely angry, and I'm increasingly not alone.

When AI coding tools went mainstream, a lot of CEOs made what seemed like a reasonable bet: hand out licenses for Cursor, Claude Code, Codex, and similar tools. Let employees experiment. Encourage AI adoption from the ground up. The logic was intuitive. Get people comfortable with AI by letting them build things.

What actually happened is that companies are now sitting on mountains of internally vibe-coded applications that nobody asked for, nobody can maintain, nobody can scale, and nobody can secure. Employees across non-engineering functions are spending significant portions of their workday prompting AI agents to build apps, dashboards, and tools that serve no business purpose, were never scoped against a real requirement, and were never reviewed by anyone who understands software engineering.

I know a company that burned through $7 million in API tokens. $7 million. The output was millions of lines of AI-generated code scattered across dozens of internal projects that shipped no measurable business value. Not low value. Zero.

The term for this phenomenon, 'vibe coding,' was coined by Andrej Karpathy, the former OpenAI co-founder, in February 2025. It was Collins Dictionary's Word of the Year for 2025. A year later, Karpathy himself is publicly describing AI-generated code as 'bloaty,' 'brittle,' and 'gross,' and distinguishing between vibe coding (accepting whatever the AI produces without understanding it) and what he now calls 'agentic engineering' (using AI as a tool while maintaining human comprehension and judgment at every step).

Boris Cherny, the head of Claude Code at Anthropic, said at their Code with Claude conference on May 6, 2026 that he's 'sick of' the term vibe coding. He called it 'a bit glib' given the scale of what AI coding tools are actually doing, and he's actively looking for alternative terminology. When the guy running one of the most prominent AI coding tools is publicly distancing himself from the label, you know something has shifted.

The problem with enterprise vibe coding isn't that AI generates bad code. Sometimes it does, sometimes it doesn't. The problem is that people who can't read code are generating enormous volumes of it, accepting it uncritically, deploying it without review, and burning through tokens at a rate that nobody is tracking.

A vibe coder can't tell the difference between an agent writing 100 lines to do something that takes 2 and an efficient solution. They can't evaluate whether the architecture is maintainable. They can't identify security vulnerabilities. They can't assess whether the code will scale. They just keep prompting, keep accepting, keep deploying, and keep consuming tokens. The output looks like productivity. It isn't.

The Veracode 2025 report found that 45% of AI-generated code introduces security vulnerabilities. CodeRabbit's analysis found 1.7x more defects in AI-generated code compared to human-written code across logic, maintainability, and performance categories. GitClear's analysis of 211 million lines of code found an 8x increase in duplicated code blocks since AI coding tools went mainstream.

What's actually happening inside these companies is a triple cost: the direct token costs of generating the code, the productivity loss from employees spending work time on unscoped projects instead of their actual jobs (the work week didn't expand when AI arrived), and the future cost of cleaning up, securing, or decommissioning the code they produced. That's not zero value. That's negative value. The company spent money, lost productive hours, and created a liability. All three at once.

What to do about it

If you're running AI transformation, or if you're a CEO wondering why the AI budget keeps climbing while the results stay flat, here's where to start.

Learn to pick the right tool for the job. Not every task requires a frontier model. Map your use cases to model capabilities and price points. Use cheaper, smaller models for high-volume, low-complexity tasks. Use local models running on your own hardware for long-haul workflows where latency isn't critical and data privacy matters. Reserve the expensive flagship models for the tasks that actually require them. A well-designed model routing strategy can cut your token costs by 60-80% without sacrificing output quality.

Build real AI cost forecasting capabilities. This means dedicated roles, not a finance analyst who also looks at the AI bill. Hire or develop people who understand token economics, consumption-based budgeting, and the cost profiles of different model tiers. Deploy real-time monitoring tools that track token usage by workflow, team, and project. Run cost simulations before scaling pilots to production. Treat AI spend the way mature organizations treat cloud spend: with FinOps discipline, continuous optimization, and executive visibility.

Optimize AI usage for specific objectives, not generic 'performance.' Every AI workflow has tradeoffs between speed, cost, reliability, and capability. Learn to make those tradeoffs intentionally. If a workflow runs overnight and nobody cares if it takes 4 hours instead of 40 minutes, use a cheaper, slower model. If a workflow requires extremely high reliability over long multi-step chains, prioritize models with strong long-context performance and low hallucination rates, even if they don't top the benchmarks. If the data is sensitive, run it locally. If the task is simple, use the smallest model that can handle it. This is engineering discipline applied to AI operations and almost nobody is doing it yet.

Get vibe coding under control. This doesn't mean banning AI coding tools. It means governing them. Require that AI-generated code goes through the same review, testing, and approval processes as any other code. Track token consumption by user and project. Set budgets and alerts. Make sure anyone generating code with AI has either the skills to review it themselves or access to someone who does. Kill the projects that have no business case. And stop pretending that employees building random apps on company time is 'AI adoption.' It's not. It's waste with a narrative attached.

Stop hiring AI transformation leads who don't understand the technology. The biggest structural problem across all of this is that the people making AI strategy decisions frequently lack the technical depth to make them well. An AI transformation lead who can't evaluate model tradeoffs, can't read a token usage report, and can't assess whether a workflow is architecturally sound is going to default to the hype cycle. That's not their fault. It's the fault of the organization that put them in the role without the training or support to succeed.

The companies that figure out AI cost governance in the next 12 months are going to have a structural advantage that compounds over time. The ones that don't are going to keep writing checks they didn't plan for, chasing models they don't need, and generating code nobody can maintain.

The AI is not the problem. The way we're managing it is.

---

Sources:

- Gartner AI spending forecast ($2.52 trillion, 2026): Gartner, Inc.

- Forrester: enterprises deferring 25% of AI spend to 2027 due to unclear ROI: Forrester Research

- S&P Global: 42% of companies scrapped most AI initiatives in 2025 (vs. 17% in 2024): S&P Global Market Intelligence

- RAND Corporation: 80% of AI projects fail to deliver intended business value: RAND Corporation

- 90% of AI pilots never reach production: Gartner

- Veracode 2025: 45% of AI-generated code introduces security vulnerabilities: Veracode State of Software Security Report, 2025

- CodeRabbit: AI-generated code produces 1.7x more defects: CodeRabbit Analysis, 2025

- GitClear: 8x increase in duplicated code blocks since AI tools went mainstream: GitClear, 211M lines analyzed

- Karpathy: coined 'vibe coding' (Feb 2025), called AI code 'bloaty,' 'brittle,' 'gross' (April 2026): Business Insider

- Boris Cherny ('sick of' vibe coding term, May 6, 2026): Business Insider, Anthropic Code with Claude conference

- Collins Dictionary Word of the Year 2025: 'vibe coding': Collins English Dictionary

- FinOps Foundation: AI cost management volatility comparable to early cloud adoption: FinOps.org

- Gartner: 70% of businesses to prefer usage-based pricing by 2026: Gartner, Inc.

Comments (0)

Leave a Comment

No comments yet. Be the first to share your thoughts!