The real ROI
year 1
year 3
with positive ROI
The right metrics
Standard metrics: FCR (First Contact Resolution), AHT (Average Handle Time), CSAT (Customer Satisfaction), NPS. AI-specific metrics: deflection rate (cases handled without human), escalation rate (cases escalated), containment time (time before escalation).
Auto vs human levels
Level 1 (FAQ): hours, prices, simple processes. AI auto-resolves 80-90%. Level 2 (specific): account status, transactions, modifications. AI auto-resolves 60-77%. Level 3 (complex): disputes, special cases. Human required.
Common mistakes
(1) Pretending it's human: users detect AI in 3 messages — better to disclose. (2) Endless loops: if AI can't resolve, escalate fast. (3) Lacking context: human inheriting from AI without summary loses time. (4) No baseline: can't demonstrate improvement without "before".
Real cases
Klarna: 700 agents replaced, NPS at same level as humans, $40M annual savings. Bank of America Erica: 50M users, handles balance queries and simple transfers. SMB clients of VuraOS: 60-80% L1 deflection, $2K-15K/month savings depending on size.
Escalation design
The escalation route is the most critical UX. Best practices: auto-detect emotional intensity (curse words, capitalization, repetition), full transfer of context to the human, clear handoff ("I'm connecting you with John from the team"), specialist routing (right human for the case).
Conclusion
Customer service AI works — but not magically. Companies that win measure baseline before deploying, design escalation seriously, train the team to work alongside AI, and iterate based on metrics. Those that "deploy and pray" are in the 10% that doesn't see ROI.