AI isn't magic. It's metered.
Every prompt consumes compute. Bigger models consume more. Longer inputs consume more. And when you chain steps together — draft, retrieve, verify, rewrite, check policy, try again — you're not buying "an answer." You're buying a workflow.
Here's the clean mental model: AI costs scale with (1) how much you send, (2) how many times you send it, and (3) how many people send it at once.
What's Actually Driving the Cost
Most teams focus on "model pricing" and miss the real multipliers:
- Context length: the more text you attach, the more you pay.
- Number of calls: retries, tool calls, and multi-step flows add up fast.
- Concurrency: a pilot is one thing; a Monday morning rollout is another.
- Enterprise overhead: logging, security, policy checks, audits — necessary and not free.
Why "Agents" Raise the Stakes
Agents are just workflows that chain multiple AI steps and tools together. They're useful — and they can quietly turn one interaction into many.
One Call vs. Many
A simple summary might be one call. An "agent" that summarizes, retrieves documents, drafts a response, checks policy, rewrites for tone, then verifies citations can be a pile of calls before anyone notices.
Multiply that by a team, and the meter starts sprinting.
Agents aren't bad. Unengineered agents are expensive.
The Only Strategy That Actually Works
Don't pay for the best model when the job doesn't require it. Most work naturally falls into tiers:
- Routine tasks (summaries, extraction, formatting, first drafts): smaller, cheaper models often do fine — sometimes local models when policy requires it.
- High-stakes tasks (complex reasoning, sensitive material, critical decisions): escalate to more capable models when needed.
Route, Govern, Measure
The future is routed, governed systems that do one simple thing well: use the cheapest model that meets the quality and risk bar — and only escalate when necessary.
And measure the right number: cost per outcome, not cost per prompt.
If you can do that, AI becomes leverage — not a line item that gets euthanized at the next budget review.
Peter Cellino — I'm pro-AI and pro-efficiency. I'm also pro-not-setting-money-on-fire. Now excuse me while I spend $6 on coffee to experience "metered consumption" in analog form.
Peter Cellino is the publisher of The Charlotte Mercury and the creator of Mercury Local — a platform rebuilding local news into something that actually serves residents and the local businesses who need to reach them, without the ad-tech middlemen.