The Economics of AI — ROI Frameworks and Cost Structures — AI Strategy & Leadership

Sarah's $81,000 Surprise

Sarah is VP of Finance at a 2,000-person SaaS company. Her engineering team wanted to launch an AI chatbot for 100,000 users. They estimated the cost at $10,000/month. They put it on a board slide. The board nodded.

Then Sarah actually ran the numbers.

$81,000 per month.

Eight times what engineering promised. The slide was already in the board deck. The project was already approved. And nobody had done the math.

This chapter makes sure you never get blindsided like that. By the end, you'll be able to calculate any AI bill on the back of a napkin — and you'll actually understand what you're paying for.

AI Costs Work Like a Phone Bill

Seriously. If you understand your phone bill, you understand AI pricing. Here's the analogy:

Phone bill	AI bill
How much you talk (minutes used)	How much text you send in (input tokens)
How much you listen (minutes received)	How much text the AI sends back (output tokens)
Your plan (basic vs. premium)	Your model tier (cheap vs. smart)

That's it. Three things control your AI bill:

Input tokens — the text you send to the AI (your question, background info, instructions)
Output tokens — the text the AI sends back (its answer)
Model tier — which AI "brain" you pick (cheap and fast, or expensive and smart)

Wait, what's a token?

A token is a small chunk of text — roughly three-quarters of a word. A common word like "hamburger" is 1 token; a long word like "uncharacteristically" can be 5–6 tokens. The sentence "What is the weather today?" is around 6 tokens.

Every token costs money. Input tokens and output tokens have different prices. And those prices change dramatically based on which model you pick.

💭You're Probably Wondering…

There Are No Dumb Questions

Q: Why do output tokens cost more than input tokens? A: Because generating new text is harder work for the computer than reading text. Reading is cheap. Writing is expensive. Just like in real life — reading a book costs you nothing, but hiring someone to write one costs a fortune.

Q: Why are there different model tiers? Can't I just use the best one? A: You can, but it's like hiring a brain surgeon to put on a Band-Aid. The expensive model is smarter, but most tasks don't need that much brainpower. You'd be burning money for no reason.

The Price Menu

Before you look at the numbers: Guess how much more expensive the most powerful AI model is compared to the cheapest one, for the same task. Is it 2x? 5x? 20x? Write down your estimate — the real number tends to surprise people.

Here's what the three model tiers actually cost (using Anthropic's Claude as our example):

Model	What it's good at	Input price (per 1M tokens)	Output price (per 1M tokens)
Claude Haiku (fast tier)	Simple stuff: FAQs, routing, summaries	$0.25	$1.25
Claude Sonnet (balanced)	Most business tasks: analysis, writing, coding	$3.00	$15.00
Claude (premium tier)	Hard stuff: complex reasoning, research	$15.00	$75.00

Prices are illustrative. AI token costs have dropped significantly and change frequently — verify current rates at provider pricing pages.

LLM inference cost is falling fast

Look at those numbers. The premium tier costs 60x more than the fast tier for input tokens — model names and pricing tiers change; use the provider's current pricing page for the exact model lineup. Same question, same answer format — 60x the price. That's the single biggest cost lever in your entire AI budget, and it's an architecture decision your engineering team might make in a five-minute Slack thread.

⚡

Quick math check

25 XP

If you send 1 million tokens to Haiku, it costs $0.25. If you send the same 1 million tokens to Opus, it costs $15.00. How many times more expensive is Opus than Haiku for input tokens? _Answer: $15.00 / $0.25 = 60x. That's the difference between a $1,000 monthly bill and a $60,000 monthly bill for the exact same workload._

Let's Do Sarah's Math — Step by Step (using the illustrative prices from the table above)

No skipping ahead. We're going to walk through Sarah's calculation the way she should have before that board meeting.

The setup:

100,000 users
Each user asks 5 questions per day
Each question sends ~800 input tokens and gets back ~200 output tokens
Engineering picked Sonnet for everything

Step 1: How many queries per day?


Users	100,000
Queries per user per day	x 5
Total queries per day	= 500,000

Step 2: How many tokens per day?

Token type	Tokens per query	Queries per day	Total tokens/day
Input	800	500,000	400,000,000 (400M)
Output	200	500,000	100,000,000 (100M)

Step 3: What does Sonnet charge?

Token type	Tokens/day	Price per 1M tokens	Daily cost
Input	400M	$3.00	$1,200
Output	100M	$15.00	$1,500
		Daily total	$2,700

Step 4: Monthly bill


Daily cost	$2,700
Days in a month	x 30
Monthly bill	$81,000

Engineering estimated $10,000. The real number is $81,000. That's an 8.1x miss.

💭You're Probably Wondering…

There Are No Dumb Questions

Q: How can engineers be that far off? A: They usually estimate based on their own testing — maybe 100 test queries. They forget to multiply by 100,000 users, or they forget that output tokens cost 5x more than input tokens. Small oversights compound into massive budget misses at scale.

Q: Should I just reject any project that costs $81K/month? A: Not necessarily. The question isn't "is this expensive?" — it's "is this expensive relative to the value it creates?" An AI chatbot that replaces 15 support agents at $5K/month each saves $75K/month. Now $81K looks a lot more reasonable. But you need to know the real number first.

⚡

Your turn

25 XP

Run the same calculation but for **Haiku** pricing ($0.25/1M input, $1.25/1M output). Same 100,000 users, same 5 queries/day, same 800 input + 200 output tokens. _Hint: Input cost = 400M tokens x $0.25 / 1M = ? per day. Output cost = 100M tokens x $1.25 / 1M = ? per day. Multiply daily total by 30._

The answer: Haiku costs $6,750/month for the same workload. That's 12x cheaper than Sonnet.

But here's the catch — Haiku is less capable. It might fumble your complex queries. So you can't just swap everything to Haiku and call it a day. You need a smarter approach.

Model Routing = Call Center Staffing

Think about how a well-run call center works:

Simple questions ("What's my balance?" "How do I reset my password?") go to junior agents. They're fast, cheap, and perfectly capable.
Complex questions ("I was double-charged on a disputed transaction across two accounts") go to senior agents. They cost more, but they get it right.

You wouldn't pay senior-agent rates for every single call. That would be insane.

Model routing works the same way:

Query type	% of traffic	Model	Why
Simple (FAQs, lookups, summaries)	70%	Haiku (junior agent)	Fast, cheap, good enough
Complex (analysis, edge cases, nuance)	30%	Sonnet (senior agent)	Smarter, handles hard stuff

The blended cost math

Let's recalculate Sarah's bill with a 70/30 routing split:

Haiku handles 70% of queries (350,000/day):

Token type	Tokens/day	Price per 1M	Daily cost
Input	280M	$0.25	$70
Output	70M	$1.25	$87.50
		Haiku daily	$157.50

Sonnet handles 30% of queries (150,000/day):

Token type	Tokens/day	Price per 1M	Daily cost
Input	120M	$3.00	$360
Output	30M	$15.00	$450
		Sonnet daily	$810

Blended monthly cost: ($157.50 + $810) x 30 = ~$29,000/month

That's a 64% reduction from the all-Sonnet bill, with quality preserved where it matters.

Strategy	Monthly cost	vs. all-Sonnet
All Sonnet	$81,000	baseline
All Haiku	$6,750	-92% (but quality drops)
70/30 routing	$29,000	-64% (quality preserved)

💭You're Probably Wondering…

There Are No Dumb Questions

Q: How does the system know which queries are "simple" vs. "complex"? A: A tiny classifier (usually running on Haiku itself, costing almost nothing) reads each query and routes it. Think of it as a receptionist deciding which agent picks up the call. Your engineering team can build this in one sprint.

Q: What if the cheap model gets a hard question wrong? A: Good routing systems include confidence checks. If Haiku isn't confident, it escalates to Sonnet automatically — just like a junior agent transferring a tricky call.

⚡

Routing challenge

25 XP

What if your traffic is 50/50 instead of 70/30 — 50% to Haiku and 50% to Sonnet? Calculate the blended monthly cost using the same 500,000 queries/day and 800 input + 200 output tokens per query. _Hint: Haiku handles 250,000 queries/day. Sonnet handles 250,000 queries/day. Calculate each, add them, multiply by 30._

The Secret Weapon: Prompt Caching

There's one more trick that can slash your bill — and it requires almost no engineering effort.

Think of it like a photocopier. Every time your chatbot answers a question, it re-reads the same set of instructions (the "system prompt"). That's like typing the same cover letter from scratch every time you apply for a job. Ridiculous, right?

Prompt caching stores a copy of those instructions so the AI doesn't re-read them every time. The cached tokens cost 90% less.

Here's a real example:

	Without caching	With caching
System prompt size	2,000 tokens	2,000 tokens
Queries per day	500,000	500,000
Tokens re-read daily	1 billion	1 billion
Cost of re-reading (Sonnet)	$3,000/day	$300/day
Monthly savings	—	$81,000/month

Your engineering team can enable this in a single sprint. It's free money.

⚡

Caching calculation

25 XP

Your app has a 3,000-token system prompt. You have 200,000 queries per day using Sonnet ($3/1M input tokens). How much do you save per month by enabling prompt caching (90% discount on cached tokens)? _Hint: Daily cached tokens = 3,000 x 200,000 = 600M. Cost without caching = 600M x $3/1M = $1,800/day. Cost with caching = $1,800 x 0.10 = $180/day. Monthly saving = ?_

The Cost Diagram — How It All Fits Together

Three levers, one bill. You control all of them.

The Board-Ready Playbook

Sarah revised her board slide, engineering added routing logic in one sprint, and the board approved the project at $29K/month instead of $81K. Here's what she learned — and what you should do before approving any AI budget:

Step 1: Get the real numbers — users, queries/day, tokens per query, model tier. Do the multiplication yourself. It takes ten minutes.

Step 2: Ask engineering about model routing. If they're using one model for everything, that's like paying senior-agent rates for password resets.

Step 3: Ask about prompt caching. If the system prompt is longer than 1,000 tokens and they haven't enabled caching, you're leaving money on the table.

Step 4: Stage the investment with gates. Don't release the full budget based on pilot results — tie funding to production milestones with kill criteria at each gate.

⚡

Challenge

50 XP

Your board has approved $50,000/month for AI costs. You expect 50,000 users making 3 queries/day. First, calculate your total monthly token budget in dollars. Then, figure out the maximum average token budget per query if using Sonnet ($3/1M input, $15/1M output). Finally, recalculate if you route 70% of queries to Haiku ($0.25/1M input, $1.25/1M output) and 30% to Sonnet — how many more tokens can each query use? _Hint: Start with total daily queries — 50,000 users x 3 queries = 150,000 queries/day, and 150,000 x 30 days = 4,500,000 queries/month. That $50,000 spread over 4.5M queries gives you roughly $0.011 per query to spend._

✗ Without AI

✗Full control over data and weights
✗$10M–$1B+ to train frontier model
✗Requires world-class ML team
✗Appropriate for: Google, Meta, OpenAI

✓ With AI

✓Immediate access to frontier capability
✓$0.00025–$0.06 per 1K tokens (verify current rates — prices have dropped significantly since 2023)
✓Requires engineering, not ML research
✓Appropriate for: 99.9% of companies

Back to Sarah

She walked back into the board meeting with new numbers.

"Engineering estimated $10,000 a month. I ran the actual model — at our user volume, with the responses that system generates, the real cost is $81,000. I'd like to propose we approve the project with three conditions: we route routine queries to the cheaper model tier, we add prompt caching for the system prompt, and we set a $20,000 monthly hard cap while we validate usage in the first 90 days."

The board approved it. The project shipped. First-month costs: $18,400.

The $81,000 surprise turned into a $63,000 saving — not because the engineering team was wrong about AI, but because someone ran the numbers before the commitment was made.

Key takeaways

You can prevent 10-100x budget surprises by modeling input tokens, output tokens, and model tier before committing to any AI architecture.
Every time you route simple queries to a cheaper model, you save 60-80% of AI spend without users noticing any quality difference.
You can cut input costs by up to 90% for repeated instructions by enabling prompt caching on any app with a large, fixed system prompt.

Knowledge Check

1.Your CFO asks you to justify a $2M AI investment. You have promising pilot results but no production numbers yet. What is the most credible way to structure the business case?

2.A vendor quotes $500K for an enterprise AI platform. Before approving, your finance team should model additional cost categories. Which set of categories is most complete?

3.What is the key difference in measuring ROI for an automation use case versus a decision-augmentation use case, and why does it matter?

4.Industry experience consistently shows that poor data quality is a leading cause of AI project failure after launch. How should this risk shape how you stage AI investments?

Sarah's $81,000 Surprise

Then Sarah actually ran the numbers.

$81,000 per month.

Eight times what engineering promised. The slide was already in the board deck. The project was already approved. And nobody had done the math.

This chapter makes sure you never get blindsided like that. By the end, you'll be able to calculate any AI bill on the back of a napkin — and you'll actually understand what you're paying for.

AI Costs Work Like a Phone Bill

Seriously. If you understand your phone bill, you understand AI pricing. Here's the analogy:

Phone bill	AI bill
How much you talk (minutes used)	How much text you send in (input tokens)
How much you listen (minutes received)	How much text the AI sends back (output tokens)
Your plan (basic vs. premium)	Your model tier (cheap vs. smart)

That's it. Three things control your AI bill:

Input tokens — the text you send to the AI (your question, background info, instructions)
Output tokens — the text the AI sends back (its answer)
Model tier — which AI "brain" you pick (cheap and fast, or expensive and smart)

Wait, what's a token?

Every token costs money. Input tokens and output tokens have different prices. And those prices change dramatically based on which model you pick.

💭You're Probably Wondering…

There Are No Dumb Questions

The Price Menu

Here's what the three model tiers actually cost (using Anthropic's Claude as our example):

Model	What it's good at	Input price (per 1M tokens)	Output price (per 1M tokens)
Claude Haiku (fast tier)	Simple stuff: FAQs, routing, summaries	$0.25	$1.25
Claude Sonnet (balanced)	Most business tasks: analysis, writing, coding	$3.00	$15.00
Claude (premium tier)	Hard stuff: complex reasoning, research	$15.00	$75.00

Prices are illustrative. AI token costs have dropped significantly and change frequently — verify current rates at provider pricing pages.

LLM inference cost is falling fast

⚡

Quick math check

25 XP

Let's Do Sarah's Math — Step by Step (using the illustrative prices from the table above)

No skipping ahead. We're going to walk through Sarah's calculation the way she should have before that board meeting.

The setup:

100,000 users
Each user asks 5 questions per day
Each question sends ~800 input tokens and gets back ~200 output tokens
Engineering picked Sonnet for everything

Step 1: How many queries per day?


Users	100,000
Queries per user per day	x 5
Total queries per day	= 500,000

Step 2: How many tokens per day?

Token type	Tokens per query	Queries per day	Total tokens/day
Input	800	500,000	400,000,000 (400M)
Output	200	500,000	100,000,000 (100M)

Step 3: What does Sonnet charge?

Token type	Tokens/day	Price per 1M tokens	Daily cost
Input	400M	$3.00	$1,200
Output	100M	$15.00	$1,500
		Daily total	$2,700

Step 4: Monthly bill


Daily cost	$2,700
Days in a month	x 30
Monthly bill	$81,000

Engineering estimated $10,000. The real number is $81,000. That's an 8.1x miss.

💭You're Probably Wondering…

There Are No Dumb Questions

⚡

Your turn

25 XP

The answer: Haiku costs $6,750/month for the same workload. That's 12x cheaper than Sonnet.

But here's the catch — Haiku is less capable. It might fumble your complex queries. So you can't just swap everything to Haiku and call it a day. You need a smarter approach.

Model Routing = Call Center Staffing

Think about how a well-run call center works:

Simple questions ("What's my balance?" "How do I reset my password?") go to junior agents. They're fast, cheap, and perfectly capable.
Complex questions ("I was double-charged on a disputed transaction across two accounts") go to senior agents. They cost more, but they get it right.

You wouldn't pay senior-agent rates for every single call. That would be insane.

Model routing works the same way:

Query type	% of traffic	Model	Why
Simple (FAQs, lookups, summaries)	70%	Haiku (junior agent)	Fast, cheap, good enough
Complex (analysis, edge cases, nuance)	30%	Sonnet (senior agent)	Smarter, handles hard stuff

The blended cost math

Let's recalculate Sarah's bill with a 70/30 routing split:

Haiku handles 70% of queries (350,000/day):

Token type	Tokens/day	Price per 1M	Daily cost
Input	280M	$0.25	$70
Output	70M	$1.25	$87.50
		Haiku daily	$157.50

Sonnet handles 30% of queries (150,000/day):

Token type	Tokens/day	Price per 1M	Daily cost
Input	120M	$3.00	$360
Output	30M	$15.00	$450
		Sonnet daily	$810

Blended monthly cost: ($157.50 + $810) x 30 = ~$29,000/month

That's a 64% reduction from the all-Sonnet bill, with quality preserved where it matters.

Strategy	Monthly cost	vs. all-Sonnet
All Sonnet	$81,000	baseline
All Haiku	$6,750	-92% (but quality drops)
70/30 routing	$29,000	-64% (quality preserved)

💭You're Probably Wondering…

There Are No Dumb Questions

⚡

Routing challenge

25 XP

The Secret Weapon: Prompt Caching

There's one more trick that can slash your bill — and it requires almost no engineering effort.

Prompt caching stores a copy of those instructions so the AI doesn't re-read them every time. The cached tokens cost 90% less.

Here's a real example:

	Without caching	With caching
System prompt size	2,000 tokens	2,000 tokens
Queries per day	500,000	500,000
Tokens re-read daily	1 billion	1 billion
Cost of re-reading (Sonnet)	$3,000/day	$300/day
Monthly savings	—	$81,000/month

Your engineering team can enable this in a single sprint. It's free money.

⚡

Caching calculation

25 XP

The Cost Diagram — How It All Fits Together

Three levers, one bill. You control all of them.

The Board-Ready Playbook

Step 1: Get the real numbers — users, queries/day, tokens per query, model tier. Do the multiplication yourself. It takes ten minutes.

Step 2: Ask engineering about model routing. If they're using one model for everything, that's like paying senior-agent rates for password resets.

Step 3: Ask about prompt caching. If the system prompt is longer than 1,000 tokens and they haven't enabled caching, you're leaving money on the table.

Step 4: Stage the investment with gates. Don't release the full budget based on pilot results — tie funding to production milestones with kill criteria at each gate.

⚡

Challenge

50 XP

✗ Without AI

✗Full control over data and weights
✗$10M–$1B+ to train frontier model
✗Requires world-class ML team
✗Appropriate for: Google, Meta, OpenAI

✓ With AI

✓Immediate access to frontier capability
✓$0.00025–$0.06 per 1K tokens (verify current rates — prices have dropped significantly since 2023)
✓Requires engineering, not ML research
✓Appropriate for: 99.9% of companies

Back to Sarah

She walked back into the board meeting with new numbers.

The board approved it. The project shipped. First-month costs: $18,400.

The $81,000 surprise turned into a $63,000 saving — not because the engineering team was wrong about AI, but because someone ran the numbers before the commitment was made.

Key takeaways

You can prevent 10-100x budget surprises by modeling input tokens, output tokens, and model tier before committing to any AI architecture.
Every time you route simple queries to a cheaper model, you save 60-80% of AI spend without users noticing any quality difference.
You can cut input costs by up to 90% for repeated instructions by enabling prompt caching on any app with a large, fixed system prompt.

Knowledge Check

1.Your CFO asks you to justify a $2M AI investment. You have promising pilot results but no production numbers yet. What is the most credible way to structure the business case?

2.A vendor quotes $500K for an enterprise AI platform. Before approving, your finance team should model additional cost categories. Which set of categories is most complete?

3.What is the key difference in measuring ROI for an automation use case versus a decision-augmentation use case, and why does it matter?

4.Industry experience consistently shows that poor data quality is a leading cause of AI project failure after launch. How should this risk shape how you stage AI investments?