The Economics of AI — ROI Frameworks and Cost Structures
Understand token economics well enough to approve AI budgets that are accurate, not 10–100x off.
Sarah's $81,000 Surprise
Sarah is VP of Finance at a 2,000-person SaaS company. Her engineering team wanted to launch an AI chatbot for 100,000 users. They estimated the cost at $10,000/month. They put it on a board slide. The board nodded.
Then Sarah actually ran the numbers.
$81,000 per month.
Eight times what engineering promised. The slide was already in the board deck. The project was already approved. And nobody had done the math.
This chapter makes sure you never get blindsided like that. By the end, you'll be able to calculate any AI bill on the back of a napkin — and you'll actually understand what you're paying for.
AI Costs Work Like a Phone Bill
Seriously. If you understand your phone bill, you understand AI pricing. Here's the analogy:
| Phone bill | AI bill |
|---|---|
| How much you talk (minutes used) | How much text you send in (input tokens) |
| How much you listen (minutes received) | How much text the AI sends back (output tokens) |
| Your plan (basic vs. premium) | Your model tier (cheap vs. smart) |
That's it. Three things control your AI bill:
- Input tokens — the text you send to the AI (your question, background info, instructions)
- Output tokens — the text the AI sends back (its answer)
- Model tier — which AI "brain" you pick (cheap and fast, or expensive and smart)
Wait, what's a token?
A token is a small chunk of text — roughly three-quarters of a word. A common word like "hamburger" is 1 token; a long word like "uncharacteristically" can be 5–6 tokens. The sentence "What is the weather today?" is around 6 tokens.
Every token costs money. Input tokens and output tokens have different prices. And those prices change dramatically based on which model you pick.
There Are No Dumb Questions
Q: Why do output tokens cost more than input tokens? A: Because generating new text is harder work for the computer than reading text. Reading is cheap. Writing is expensive. Just like in real life — reading a book costs you nothing, but hiring someone to write one costs a fortune.
Q: Why are there different model tiers? Can't I just use the best one? A: You can, but it's like hiring a brain surgeon to put on a Band-Aid. The expensive model is smarter, but most tasks don't need that much brainpower. You'd be burning money for no reason.
The Price Menu
Before you look at the numbers: Guess how much more expensive the most powerful AI model is compared to the cheapest one, for the same task. Is it 2x? 5x? 20x? Write down your estimate — the real number tends to surprise people.
Here's what the three model tiers actually cost (using Anthropic's Claude as our example):
| Model | What it's good at | Input price (per 1M tokens) | Output price (per 1M tokens) |
|---|---|---|---|
| Claude Haiku (fast tier) | Simple stuff: FAQs, routing, summaries | $0.25 | $1.25 |
| Claude Sonnet (balanced) | Most business tasks: analysis, writing, coding | $3.00 | $15.00 |
| Claude (premium tier) | Hard stuff: complex reasoning, research | $15.00 | $75.00 |
Prices are illustrative. AI token costs have dropped significantly and change frequently — verify current rates at provider pricing pages.
LLM inference cost is falling fast
Look at those numbers. The premium tier costs 60x more than the fast tier for input tokens — model names and pricing tiers change; use the provider's current pricing page for the exact model lineup. Same question, same answer format — 60x the price. That's the single biggest cost lever in your entire AI budget, and it's an architecture decision your engineering team might make in a five-minute Slack thread.
Quick math check
25 XPLet's Do Sarah's Math — Step by Step (using the illustrative prices from the table above)
No skipping ahead. We're going to walk through Sarah's calculation the way she should have before that board meeting.
The setup:
- 100,000 users
- Each user asks 5 questions per day
- Each question sends ~800 input tokens and gets back ~200 output tokens
- Engineering picked Sonnet for everything
Step 1: How many queries per day?
| Users | 100,000 |
| Queries per user per day | x 5 |
| Total queries per day | = 500,000 |
Step 2: How many tokens per day?
| Token type | Tokens per query | Queries per day | Total tokens/day |
|---|---|---|---|
| Input | 800 | 500,000 | 400,000,000 (400M) |
| Output | 200 | 500,000 | 100,000,000 (100M) |
Step 3: What does Sonnet charge?
| Token type | Tokens/day | Price per 1M tokens | Daily cost |
|---|---|---|---|
| Input | 400M | $3.00 | $1,200 |
| Output | 100M | $15.00 | $1,500 |
| Daily total | $2,700 |
Step 4: Monthly bill
| Daily cost | $2,700 |
| Days in a month | x 30 |
| Monthly bill | $81,000 |
Engineering estimated $10,000. The real number is $81,000. That's an 8.1x miss.
There Are No Dumb Questions
Q: How can engineers be that far off? A: They usually estimate based on their own testing — maybe 100 test queries. They forget to multiply by 100,000 users, or they forget that output tokens cost 5x more than input tokens. Small oversights compound into massive budget misses at scale.
Q: Should I just reject any project that costs $81K/month? A: Not necessarily. The question isn't "is this expensive?" — it's "is this expensive relative to the value it creates?" An AI chatbot that replaces 15 support agents at $5K/month each saves $75K/month. Now $81K looks a lot more reasonable. But you need to know the real number first.
Your turn
25 XPThe answer: Haiku costs $6,750/month for the same workload. That's 12x cheaper than Sonnet.
But here's the catch — Haiku is less capable. It might fumble your complex queries. So you can't just swap everything to Haiku and call it a day. You need a smarter approach.
Model Routing = Call Center Staffing
Think about how a well-run call center works:
- Simple questions ("What's my balance?" "How do I reset my password?") go to junior agents. They're fast, cheap, and perfectly capable.
- Complex questions ("I was double-charged on a disputed transaction across two accounts") go to senior agents. They cost more, but they get it right.
You wouldn't pay senior-agent rates for every single call. That would be insane.
Model routing works the same way:
| Query type | % of traffic | Model | Why |
|---|---|---|---|
| Simple (FAQs, lookups, summaries) | 70% | Haiku (junior agent) | Fast, cheap, good enough |
| Complex (analysis, edge cases, nuance) | 30% | Sonnet (senior agent) | Smarter, handles hard stuff |
The blended cost math
Let's recalculate Sarah's bill with a 70/30 routing split:
Haiku handles 70% of queries (350,000/day):
| Token type | Tokens/day | Price per 1M | Daily cost |
|---|---|---|---|
| Input | 280M | $0.25 | $70 |
| Output | 70M | $1.25 | $87.50 |
| Haiku daily | $157.50 |
Sonnet handles 30% of queries (150,000/day):
| Token type | Tokens/day | Price per 1M | Daily cost |
|---|---|---|---|
| Input | 120M | $3.00 | $360 |
| Output | 30M | $15.00 | $450 |
| Sonnet daily | $810 |
Blended monthly cost: ($157.50 + $810) x 30 = ~$29,000/month
That's a 64% reduction from the all-Sonnet bill, with quality preserved where it matters.
| Strategy | Monthly cost | vs. all-Sonnet |
|---|---|---|
| All Sonnet | $81,000 | baseline |
| All Haiku | $6,750 | -92% (but quality drops) |
| 70/30 routing | $29,000 | -64% (quality preserved) |
There Are No Dumb Questions
Q: How does the system know which queries are "simple" vs. "complex"? A: A tiny classifier (usually running on Haiku itself, costing almost nothing) reads each query and routes it. Think of it as a receptionist deciding which agent picks up the call. Your engineering team can build this in one sprint.
Q: What if the cheap model gets a hard question wrong? A: Good routing systems include confidence checks. If Haiku isn't confident, it escalates to Sonnet automatically — just like a junior agent transferring a tricky call.
Routing challenge
25 XPThe Secret Weapon: Prompt Caching
There's one more trick that can slash your bill — and it requires almost no engineering effort.
Think of it like a photocopier. Every time your chatbot answers a question, it re-reads the same set of instructions (the "system prompt"). That's like typing the same cover letter from scratch every time you apply for a job. Ridiculous, right?
Prompt caching stores a copy of those instructions so the AI doesn't re-read them every time. The cached tokens cost 90% less.
Here's a real example:
| Without caching | With caching | |
|---|---|---|
| System prompt size | 2,000 tokens | 2,000 tokens |
| Queries per day | 500,000 | 500,000 |
| Tokens re-read daily | 1 billion | 1 billion |
| Cost of re-reading (Sonnet) | $3,000/day | $300/day |
| Monthly savings | — | $81,000/month |
Your engineering team can enable this in a single sprint. It's free money.
Caching calculation
25 XPThe Cost Diagram — How It All Fits Together
Three levers, one bill. You control all of them.
The Board-Ready Playbook
Sarah revised her board slide, engineering added routing logic in one sprint, and the board approved the project at $29K/month instead of $81K. Here's what she learned — and what you should do before approving any AI budget:
Step 1: Get the real numbers — users, queries/day, tokens per query, model tier. Do the multiplication yourself. It takes ten minutes.
Step 2: Ask engineering about model routing. If they're using one model for everything, that's like paying senior-agent rates for password resets.
Step 3: Ask about prompt caching. If the system prompt is longer than 1,000 tokens and they haven't enabled caching, you're leaving money on the table.
Step 4: Stage the investment with gates. Don't release the full budget based on pilot results — tie funding to production milestones with kill criteria at each gate.
Challenge
50 XP✗ Without AI
- ✗Full control over data and weights
- ✗$10M–$1B+ to train frontier model
- ✗Requires world-class ML team
- ✗Appropriate for: Google, Meta, OpenAI
✓ With AI
- ✓Immediate access to frontier capability
- ✓$0.00025–$0.06 per 1K tokens (verify current rates — prices have dropped significantly since 2023)
- ✓Requires engineering, not ML research
- ✓Appropriate for: 99.9% of companies
Back to Sarah
She walked back into the board meeting with new numbers.
"Engineering estimated $10,000 a month. I ran the actual model — at our user volume, with the responses that system generates, the real cost is $81,000. I'd like to propose we approve the project with three conditions: we route routine queries to the cheaper model tier, we add prompt caching for the system prompt, and we set a $20,000 monthly hard cap while we validate usage in the first 90 days."
The board approved it. The project shipped. First-month costs: $18,400.
The $81,000 surprise turned into a $63,000 saving — not because the engineering team was wrong about AI, but because someone ran the numbers before the commitment was made.
Key takeaways
- You can prevent 10-100x budget surprises by modeling input tokens, output tokens, and model tier before committing to any AI architecture.
- Every time you route simple queries to a cheaper model, you save 60-80% of AI spend without users noticing any quality difference.
- You can cut input costs by up to 90% for repeated instructions by enabling prompt caching on any app with a large, fixed system prompt.
Knowledge Check
1.Your CFO asks you to justify a $2M AI investment. You have promising pilot results but no production numbers yet. What is the most credible way to structure the business case?
2.A vendor quotes $500K for an enterprise AI platform. Before approving, your finance team should model additional cost categories. Which set of categories is most complete?
3.What is the key difference in measuring ROI for an automation use case versus a decision-augmentation use case, and why does it matter?
4.Industry experience consistently shows that poor data quality is a leading cause of AI project failure after launch. How should this risk shape how you stage AI investments?