Designing AI agents that escalate well
The hardest part of an autonomous agent is teaching it when not to be autonomous.
Most AI agent failures we see in production aren't the agent doing the wrong thing — they're the agent doing something when it should have stopped to ask.
Good escalation design starts with confidence scoring. Every agent decision should produce a confidence number, and you should know empirically what threshold separates 'just do it' from 'check with a human'.
We typically calibrate this against historical human decisions. Run the agent against last quarter's tickets in shadow mode, measure where it agreed and disagreed with humans, and pick a threshold that balances autonomy and safety.
Beyond confidence, design escalation paths that respect human time. Bad escalations are vague ("this needs review"); good escalations include the agent's reasoning, the alternatives it considered, and a recommended action with one click to accept.
Keep reading
More from the blog.
How we ship LLM features without breaking prod
Evals, traffic shadowing, and the boring deploy gates that keep our agents trustworthy.
Mar 21, 2026 · 11 min read
Why we still write CRMs from scratch in 2026
Salesforce and HubSpot are great. Sometimes a custom Laravel CRM is still the right answer.
Feb 26, 2026 · 7 min read
Lighthouse 98 is the new minimum
How we ship Next.js sites that score 98+ across the board, and why it now matters more than ever for SEO.
Jan 15, 2026 · 5 min read
Get started
Want this in your inbox?
We email occasionally — when there's something genuinely useful to share. No spam.