AI isn't magic. It's metered. Every prompt consumes compute. Bigger models consume more. Longer inputs consume more. And when you chain steps together—draft, retrieve, verify, rewrite, check policy, try again—you're not buying "an answer." You're buying a workflow. Here's the clean mental model: AI costs scale with (1) how much you send, (2) how many times you send it, and (3) how many people send it at once. ## What's actually driving the cost Most teams focus on "model pricing" and miss the real multipliers: - Context length: the more text you attach, the more you pay. - Number of calls: retries, tool calls, and multi-step flows add up fast. - Concurrency: a pilot is one thing; a Monday morning rollout is another. - Enterprise overhead: logging, security, policy checks, audits—necessary and not free. ## Why "agents" raise the stakes Agents are just workflows that chain multiple AI steps and tools together. They're useful—and they can quietly turn one interaction into many. A simple summary might be one call. An "agent" that summarizes, retrieves documents, drafts a response, checks policy, rewrites for tone, then verifies citations can be a pile of calls before anyone notices. Multiply that by a team, and the meter starts sprinting. Agents aren't bad. Unengineered agents are expensive. ## The only strategy that actually works Don't pay for the best model when the job doesn't require it. Most work naturally falls into tiers: - Routine tasks (summaries, extraction, formatting, first drafts): smaller/cheaper models often do fine—sometimes local models when policy requires it. - High-stakes tasks (complex reasoning, sensitive material, critical decisions): escalate to more capable models when needed. The future is routed, governed systems that do one simple thing well: Use the cheapest model that meets the quality and risk bar—and only escalate when necessary. And measure the right number: Cost per outcome, not cost per prompt. If you can do that, AI becomes leverage—not a line item that gets euthanized at the next budget review. Peter Cellino I'm pro-AI and pro-efficiency. I'm also pro-not-setting-money-on-fire. Now excuse me while I spend $6 on coffee to experience "metered consumption" in analog form.
AI Has a Meter. Here's How Not to Get the Bill.
AI isn't magic. It's metered. Every prompt consumes compute. Bigger models consume more.