AI Product Design Patterns
Choose between automation, augmentation, and inspiration modes using a stakes-and-reversibility framework.
The most expensive PM mistake in AI: automating the wrong thing
A team shipped an AI feature that automatically sent refund confirmation emails on behalf of support agents. No human review. The AI drafted it, the AI sent it. Efficient, right?
Then the AI approved a $4,800 refund for a customer who was only owed $48. By the time anyone noticed, the money was gone.
The mistake wasn't the AI. The mistake was automating something that needed human oversight. And it's the most common PM error in AI product design — because automation is exciting, and "the AI handles it" sounds great in a demo.
(Illustrative scenario. Incidents of AI systems processing incorrect refund or payment amounts due to insufficient human oversight have been documented across multiple industries.)
The automation spectrum: three modes
Every AI feature lives somewhere on this spectrum:
None of these is better than the others. Each fits a different context. The PM's job is to pick the right mode for each feature — and the decision comes down to two questions.
Beyond the automation spectrum, most AI product features fall into one of four underlying patterns — and knowing the pattern tells you what the core UX challenge will be:
Four core AI product patterns
Search & Retrieval (find): surface the right thing from a large corpus — main challenge is relevance. Generation (create): produce new content — main challenge is quality and voice. Classification (organise): sort inputs into categories — main challenge is accuracy and edge cases. Recommendation (discover): suggest next actions — main challenge is trust and explainability.
The two questions that decide everything
(In the previous module, you learned that hallucination is a structural property of LLMs — it can't be eliminated, only mitigated. That's exactly why the stakes-and-reversibility framework matters: some mistakes can be caught and fixed before anyone is harmed; others can't. The mode you choose determines how much risk you're building in.)
For every AI action in your spec, ask:
Question 1: What happens if the AI gets this wrong?
- "The ticket gets a wrong tag" → Low stakes (easily fixed)
- "The customer gets a wrong refund" → High stakes (money is gone)
Question 2: Can the mistake be undone?
- "An agent re-tags the ticket in two clicks" → Highly reversible
- "The refund email was already sent" → Not reversible
Now plot it:
| Easily reversible | Hard to reverse | |
|---|---|---|
| Low stakes | AUTOMATION — let the AI do it | AUGMENTATION — AI drafts, human approves |
| High stakes | AUGMENTATION — AI drafts, human approves | HUMAN REVIEW REQUIRED — AI suggests, human decides and acts |
The default: When in doubt, start with augmentation. It's the safe middle ground. Earn trust with data (high accuracy over time), then consider moving toward automation.
There Are No Dumb Questions
"When should I use inspiration mode?"
When the task is open-ended or creative, and the human will rephrase whatever the AI generates anyway. "Give me 10 possible responses to this complaint" — the agent will pick one and rewrite it. The AI contributes ideas, not final output.
"Can a feature move from augmentation to automation?"
Absolutely — but only with data. If agents accept the AI draft without editing 95% of the time, AND the error rate is below your threshold, AND mistakes are cheap and reversible — then you have a case. Without those numbers, you're guessing.
Mode Selector
25 XPReal example: Maya's three workflows
Maya, a support ops PM at an e-commerce company, handles 4,000 tickets/week. Agents spend 40% of their time on tagging, drafting, and routing. She wants to reclaim that time — but not by introducing new errors.
Workflow 1: Auto-categorise tickets → Automation
Stakes: Low — a miscategorised ticket gets re-routed within minutes, customer never notices. Reversible? Yes — an agent fixes the tag in two clicks. Mode: Full automation. The AI tags every ticket without human review.
Workflow 2: Refund confirmation emails → Augmentation
Stakes: High — this is a financial transaction with legal implications. Reversible? No — an incorrect email cannot be unsent. Mode: Augmentation. The AI drafts the email. The agent reviews it and clicks Send.
Result: Average handle time dropped from 8 minutes to 3 minutes per ticket. Refund error rate: zero. The agent isn't writing from scratch anymore — they're reviewing and editing. Huge time savings, no new risk.
The turning point: An engineer flagged that auto-sending refund emails would mean the AI could commit the company to a payout without any human review. That single conversation changed the spec — exactly the kind of decision that belongs in a requirements doc, not a post-launch retro.
Workflow 3: Complex complaint responses → Inspiration
Stakes: Medium — a bad response damages the customer relationship. Reversible? Sort of — you can send a follow-up, but first impressions matter. Mode: Inspiration. The AI generates a list of 10 possible responses. The agent picks one, rephrases it in their own words, and sends it.
Why inspiration, not augmentation? The agent hasn't seen this complaint type before and needs ideas, not a draft. The AI contributes options; the human makes every decision.
Spec Review: Zendesk AI Replies
50 XPConfidence-aware UI: show uncertainty when it matters
One more pattern PMs need to know: when the AI is less confident, the UI should reflect that.
| Model confidence | UI treatment | Example |
|---|---|---|
| High (>90%) | Show as a confident default | "Your order ships in 2-3 days." |
| Medium (60-90%) | Show with hedging or alternatives | "Based on your order history, shipping is likely 2-3 days — but check the tracking page for updates." |
| Low (<60%) | Show as a suggestion, not a statement | "I'm not sure about this one. Here are some possible answers: ..." |
The trap: Showing every response with the same confidence level. A 95%-confident answer and a 50%-confident answer should NOT look the same to the user. If they do, users either over-trust the uncertain answers or under-trust the confident ones. This is the core of what's called automation surprise — when a system produces a confidently wrong answer and the user has no signal that it failed. Automation surprise is prevented by explicitly specifying graceful degradation paths in your spec and never allowing silent substitution of lower-quality outputs.
✗ Without AI
- ✗Existing workflow + AI layer
- ✗Human does core task, AI assists
- ✗Easy to add, easy to remove
- ✗Lower risk, lower upside
✓ With AI
- ✓AI does core task, human reviews
- ✓Workflow redesigned around AI capability
- ✓Requires rethinking UX from scratch
- ✓Higher risk, higher upside
Key takeaways
- Two questions decide the mode: "What happens if the AI is wrong?" and "Can the mistake be undone?" Low stakes + reversible = automation. Everything else = augmentation or inspiration.
- Start with augmentation. It's the safe default. Move toward automation only when you have data showing high accuracy, low error rate, and high reversibility.
- Financial, legal, or irreversible actions always need a human in the loop. No exceptions, no matter how good the model is.
- Match UI confidence to model confidence. Don't show uncertain answers the same way you show confident ones.
Knowledge Check
1.Your AI feature returns a recommendation with 60% model confidence. How should you design the UI differently compared to a 90% confidence output?
2.What is the 'automation surprise' failure pattern, and what is the most effective way to prevent it in a product spec?
3.When should an AI product proactively show its reasoning to users?
4.A user corrects an AI output in your product. Which two design approaches best turn that correction into a product improvement?