AI Product Design Patterns — Leading AI Products

The most expensive PM mistake in AI: automating the wrong thing

A team shipped an AI feature that automatically sent refund confirmation emails on behalf of support agents. No human review. The AI drafted it, the AI sent it. Efficient, right?

Then the AI approved a $4,800 refund for a customer who was only owed $48. By the time anyone noticed, the money was gone.

The mistake wasn't the AI. The mistake was automating something that needed human oversight. And it's the most common PM error in AI product design — because automation is exciting, and "the AI handles it" sounds great in a demo.

(Illustrative scenario. Incidents of AI systems processing incorrect refund or payment amounts due to insufficient human oversight have been documented across multiple industries.)

The automation spectrum: three modes

Every AI feature lives somewhere on this spectrum:

None of these is better than the others. Each fits a different context. The PM's job is to pick the right mode for each feature — and the decision comes down to two questions.

Beyond the automation spectrum, most AI product features fall into one of four underlying patterns — and knowing the pattern tells you what the core UX challenge will be:

Four core AI product patterns

Search & Retrieval

Generation

Classification

Recommendation

User Goal

Search & Retrieval (find): surface the right thing from a large corpus — main challenge is relevance. Generation (create): produce new content — main challenge is quality and voice. Classification (organise): sort inputs into categories — main challenge is accuracy and edge cases. Recommendation (discover): suggest next actions — main challenge is trust and explainability.

The two questions that decide everything

(In the previous module, you learned that hallucination is a structural property of LLMs — it can't be eliminated, only mitigated. That's exactly why the stakes-and-reversibility framework matters: some mistakes can be caught and fixed before anyone is harmed; others can't. The mode you choose determines how much risk you're building in.)

For every AI action in your spec, ask:

Question 1: What happens if the AI gets this wrong?

"The ticket gets a wrong tag" → Low stakes (easily fixed)
"The customer gets a wrong refund" → High stakes (money is gone)

Question 2: Can the mistake be undone?

"An agent re-tags the ticket in two clicks" → Highly reversible
"The refund email was already sent" → Not reversible

Now plot it:

	Easily reversible	Hard to reverse
Low stakes	AUTOMATION — let the AI do it	AUGMENTATION — AI drafts, human approves
High stakes	AUGMENTATION — AI drafts, human approves	HUMAN REVIEW REQUIRED — AI suggests, human decides and acts

The default: When in doubt, start with augmentation. It's the safe middle ground. Earn trust with data (high accuracy over time), then consider moving toward automation.

💭You're Probably Wondering…

There Are No Dumb Questions

"When should I use inspiration mode?"

When the task is open-ended or creative, and the human will rephrase whatever the AI generates anyway. "Give me 10 possible responses to this complaint" — the agent will pick one and rewrite it. The AI contributes ideas, not final output.

"Can a feature move from augmentation to automation?"

Absolutely — but only with data. If agents accept the AI draft without editing 95% of the time, AND the error rate is below your threshold, AND mistakes are cheap and reversible — then you have a case. Without those numbers, you're guessing.

⚡

Mode Selector

25 XP

For each feature, pick the right mode (Automation, Augmentation, or Inspiration) and justify with the two questions. | Feature | If wrong, what happens? | Reversible? | Mode | Why | |---------|------------------------|-------------|------|-----| | Auto-categorise support tickets by topic | ? | ? | ? | ? | | Auto-send a refund confirmation email | ? | ? | ? | ? | | Generate 10 possible replies to a complex complaint | ? | ? | ? | ? | | Auto-apply "urgent" tag to tickets mentioning downtime | ? | ? | ? | ? | | Generate a personalised investment recommendation | ? | ? | ? | ? | _Hint: For each row, work through two questions in order: "What's the worst that happens if the AI gets this wrong?" and "Can a human undo it easily?" Where both answers are mild, the decision leans one way. Where either answer is serious, it leans the other._

Real example: Maya's three workflows

Maya, a support ops PM at an e-commerce company, handles 4,000 tickets/week. Agents spend 40% of their time on tagging, drafting, and routing. She wants to reclaim that time — but not by introducing new errors.

Workflow 1: Auto-categorise tickets → Automation

Stakes: Low — a miscategorised ticket gets re-routed within minutes, customer never notices. Reversible? Yes — an agent fixes the tag in two clicks. Mode: Full automation. The AI tags every ticket without human review.

Workflow 2: Refund confirmation emails → Augmentation

Stakes: High — this is a financial transaction with legal implications. Reversible? No — an incorrect email cannot be unsent. Mode: Augmentation. The AI drafts the email. The agent reviews it and clicks Send.

Result: Average handle time dropped from 8 minutes to 3 minutes per ticket. Refund error rate: zero. The agent isn't writing from scratch anymore — they're reviewing and editing. Huge time savings, no new risk.

The turning point: An engineer flagged that auto-sending refund emails would mean the AI could commit the company to a payout without any human review. That single conversation changed the spec — exactly the kind of decision that belongs in a requirements doc, not a post-launch retro.

Workflow 3: Complex complaint responses → Inspiration

Stakes: Medium — a bad response damages the customer relationship. Reversible? Sort of — you can send a follow-up, but first impressions matter. Mode: Inspiration. The AI generates a list of 10 possible responses. The agent picks one, rephrases it in their own words, and sends it.

Why inspiration, not augmentation? The agent hasn't seen this complaint type before and needs ideas, not a draft. The AI contributes options; the human makes every decision.

⚡

Spec Review: Zendesk AI Replies

50 XP

Nova is building an AI-suggested-reply feature for Zendesk. Three configuration options are on the table: - **Option A:** Auto-send the AI reply immediately after a ticket is submitted - **Option B:** Show the AI draft to the agent with a "Send" button; agent can edit or dismiss - **Option C:** Show the AI draft as inspiration in a sidebar; agent writes their own reply Answer these three questions: 1. Which option is correct for standard product questions (e.g., "How do I reset my password?")? 2. Which option is correct for billing disputes involving a refund request? 3. If Zendesk's data shows agents accept the AI draft without editing 87% of the time for standard questions, should Option A become appropriate? What acceptance rate AND error rate would you need before enabling full automation? _Hint: For each question, apply the same two-question test from the table: worst-case error impact, and reversibility. For #3, acceptance rate and error rate are different metrics — think carefully about which one actually tells you whether full automation is safe._

Confidence-aware UI: show uncertainty when it matters

One more pattern PMs need to know: when the AI is less confident, the UI should reflect that.

Model confidence	UI treatment	Example
High (>90%)	Show as a confident default	"Your order ships in 2-3 days."
Medium (60-90%)	Show with hedging or alternatives	"Based on your order history, shipping is likely 2-3 days — but check the tracking page for updates."
Low (<60%)	Show as a suggestion, not a statement	"I'm not sure about this one. Here are some possible answers: ..."

The trap: Showing every response with the same confidence level. A 95%-confident answer and a 50%-confident answer should NOT look the same to the user. If they do, users either over-trust the uncertain answers or under-trust the confident ones. This is the core of what's called automation surprise — when a system produces a confidently wrong answer and the user has no signal that it failed. Automation surprise is prevented by explicitly specifying graceful degradation paths in your spec and never allowing silent substitution of lower-quality outputs.

✗ Without AI

✗Existing workflow + AI layer
✗Human does core task, AI assists
✗Easy to add, easy to remove
✗Lower risk, lower upside

✓ With AI

✓AI does core task, human reviews
✓Workflow redesigned around AI capability
✓Requires rethinking UX from scratch
✓Higher risk, higher upside

Key takeaways

Two questions decide the mode: "What happens if the AI is wrong?" and "Can the mistake be undone?" Low stakes + reversible = automation. Everything else = augmentation or inspiration.
Start with augmentation. It's the safe default. Move toward automation only when you have data showing high accuracy, low error rate, and high reversibility.
Financial, legal, or irreversible actions always need a human in the loop. No exceptions, no matter how good the model is.
Match UI confidence to model confidence. Don't show uncertain answers the same way you show confident ones.

Knowledge Check

1.Your AI feature returns a recommendation with 60% model confidence. How should you design the UI differently compared to a 90% confidence output?

2.What is the 'automation surprise' failure pattern, and what is the most effective way to prevent it in a product spec?

3.When should an AI product proactively show its reasoning to users?

4.A user corrects an AI output in your product. Which two design approaches best turn that correction into a product improvement?