how do you make a non-human actor’s behavior legible at a glance?
Just dove into this whole AI agent space, A2A by Google following MCP and I’m honestly mind-blown at what’s possible with such simple code. Wanted to capture my thoughts while they’re fresh…
I define an agent as:
agent = AI + tools + autonomy to reach goals & decide when to stop
The coolest part is watching it solve problems on its own:
USER: "can you check if anyone's in the bedroom?"
SMART HOME AGENT: *thinking*
1. Need to see bedroom
2. It's dark in there
3. Should turn on lights first
↓
SMART HOME AGENT: *turns on bedroom lights*
*activates camera*
"The bedroom is empty."
What blows my mind is how SIMPLE the code flow is to get this kind of emergent behavior. That’s the behavior I’m obsessed with but also brings in a new set of UX problems:
SIMPLE CODE ──> COMPLEX BEHAVIOR
│ │
│ ▼
│ ┌──────────┐
│ │UNEXPECTED│
│ │SOLUTIONS │
│ └──────────┘
│ │
▼ ▼
┌──────────┐ ┌──────────┐
│ HOW TO │ │ HOW TO │
│UNDERSTAND│ │ CONTROL? │
└──────────┘ └──────────┘
Now, say an AI agent embedded in a electric stove that wants to troubleshoot… it raises huge UX questions:
USER INTENT
│
▼
┌─────────┐
│ CONFIRM │◀───┐
│ REPAIR │ │
└────┬────┘ │
│ │
▼ │
┌─────────┐ │
│ VISIBLE │ │
│ ACTIONS │────┘
└────┬────┘
│
▼
┌─────────┐
│ STOVE │
│ AGENT │
└─────────┘
like…
- How does the user actually SEE what the agent is planning?
- How do you let users veto or modify plans?
- What’s the interaction model for “wait, don’t do that”?
- What happens when the agent makes a bad decision with a physical device?
Since OpenClaw
OpenClaw and similar stacks normalized persistent memory, multi-agent routing, and autonomous actions across messaging, files, and devices. Same pattern across domains… Agentic commerce is one row in the table:
INTENT → PLAN (editable) → CONSENT / SCOPE → LOG / RECEIPT
"fix the bug" diff + steps branch + secrets merge + audit
"reply to X" draft send-as + tone sent + thread
"book + pay" basket + fees limits + tokens order + trail
Same obligations everywhere… plan visibility, authority boundaries, audit and replay, plain-language failure, visible handoffs, off-by-default for irreversible or physical actions.
Agent-Native UX Missed Opportunities
Agents traverse stacks… MCP for tools, A2A for coordination, plus domain layers: git, calendar, inbox, identity, and newer agent-commerce rails (ACP, AP2, x402, MPP — one example among many). The failure mode is the same for a card charge or a pull request deploy… outcome in, invisible chain, opaque error out.
[ THE ABSTRACTION GAP ]
"buy groceries under $75"
│
┌──▼─────────────────────────────────────────────────────────┐
│ [ THE AGENT PROTOCOL STACK ] │
│ │
│ 1. MCP ──> discovers tools (GrocerMart API) │
│ 2. A2A ──> coordinates sub-agents (Pricing, Shopping) │
│ 3. ACP ──> standardizes checkout (Agent Commerce Protocol)│
│ 4. AP2 ──> proves trust (Agent Authority Protocol) │
└────────────────────────────────────────────────────────────┘
│
▼
✓ done. receipt in email.
that creates new UX obligations:
- Plan visibility… a live “what i’m about to budget” outline you can edit before execution
- Consent and authority boundaries… valid substitutions, scoped tokens, sandboxed/timeboxed access, and a real kill‑switch
- Audit + Replay… readable transcripts and action logs with diff/pr views for cart changes
- Failure surfaces… partial fulfillment, alternatives, and “why i can’t” reasons in plain language
- Cross‑agent handoffs… visible handover to payment agent, accountability, and a way to pull work back
- Off‑by‑default for physical actions; explicit confirmation for irreversible steps
Terminal-native tools (claude code, codex, openclaw-style setups) expose the obvious: agents are log-producing processes. the user sees none of it until something breaks. and when it breaks, the error says “payment failed.” from which layer? which hop? which agent?
Nobody is designing the failure surface for multi-protocol chains. We’re building the happy path and hoping the error messages sort themselves out.
$ agent "book cheapest flight NYC→SF Apr 1"
tool: flights_search(...)
tool: flights_filter(...)
... (more tools)
tool: book_flight({...})
✓ booked. confirmation: AA-7X2K9
After nine tool calls, the booking happened. no gate. no confirmation. the action was irreversible and the interface gave you a receipt, not a choice.
What we actually need is an interface that treats the terminal not as a raw log, but as a parsed surface:
╭─────────────────────────────────────────────────────────────╮
│ ● TASK // book cheapest flight NYC→SF Apr 1 │
├─────────────────────────────────────────────────────────────┤
│ │
│ [✓] search_flights(JFK→SFO, 2026-04-01) │
│ ↳ 3 options found: $312 · $445 · $520 │
│ │
│ [✓] select_flight(AA1234) │
│ ↳ $312 · 7h 20m · nonstop │
│ │
│ [!] EXECUTION HALTED // IRREVERSIBLE ACTION │
│ ┌───────────────────────────────────────────────────────┐ │
│ │ GATE: book_flight() │ │
│ │ ─────────────────────────────────────────────────── │ │
│ │ amount: $312.00 card: ****1234 │ │
│ │ policy: non-refundable agent: claude-v1 │ │
│ └───────────────────────────────────────────────────────┘ │
│ │
│ ❯❯ Action required: [Y] Confirm [N] Cancel [D] Diff │
╰─────────────────────────────────────────────────────────────╯
Most terminal-native agent platforms are still designing around the assumption that users want to see tool calls. They don’t. They want to see decisions.
What gets surfaced isn’t every tool call. It’s decisions, gates before irreversible steps, and anomalies against the stated goal.
[ LAYER 01 : AMBIENT ] ── always visible, never interrupts
░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
● shopping-agent [active] 4m 12s
↳ goal: groceries ≤ $75 ↳ current: $64.99
[ LAYER 02 : SURFACED ] ── state changes & autonomous decisions
▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒
[decide] swapped rice brand A → B (saved $1.50)
[status] 12 items · 1 substitution · on track
[ LAYER 03 : INTERRUPT ] ── the irreversible gate
▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
⚠ CHECKOUT REQUIRED
GrocerMart · $64.99 total
[ APPROVE ] [ EDIT CART ] [ CANCEL ]
RAW LOG (what happened) GOAL VIEW (what matters)
─────────────────────────── ───────────────────────────
14:23:01 search("milk") ▸ buy groceries
14:23:02 search("eggs") $64.99 · 12 items
14:23:03 cart_add("milk") 1 substitution
14:23:04 cart_add("eggs") next: checkout
14:23:05 price_check()
14:23:06 compare_brands() ▸ [expand 48 events ↓]
14:23:07 substitute("rice")
...48 more events
Group by goal, not timestamp… the raw log is for debugging. only decisions (agent chose between options), gates (before irreversible actions), and anomalies (behavior deviates from the stated goal).
When to surface: Reversibility is the primary signal. not urgency, not frequency.
REVERSIBLE ◀────────────────────────────────────────▶ IRREVERSIBLE
search plan add-to-cart pay send ship
✓ ✓ ✓ ⚠ ⚠ ✗
silent silent silent CONFIRM CONFIRM done
Nothing right of center triggers a gate. Anything left runs silently. Simple rule. Most interfaces don’t implement it.
AUDIT · session grocery-2026-03-25
────────────────────────────────────────────────────
14:23:01 [search] "weekly groceries near me"
14:23:04 [plan] cart drafted · $66.50 · 13 items
14:23:08 [decide] ↺ rice brand A→B · -$1.50 ← why?
14:23:08 [decide] ✗ soda · out of budget scope ← why?
14:23:10 [gate] PAYMENT · awaiting human
14:23:42 [human] ✓ approved · $64.99
14:23:43 [action] order #GRM-2847 · placed
14:31:00 [done] delivered · receipt logged
────────────────────────────────────────────────────
[step-through] [export] [share with agent]
the ”← why?” links are the key. each decision should carry its rationale, inspectable on demand… not in the ambient view, but always available. this is what turns a log into a real audit trail.
Coming to accountability problem in multi-agent chains:
▼ TRACE: ORDER #GRM-2847
│
├─[ EXECUTION ]── GrocerMart checkout API
│ status: 500 (Insufficient Funds)
│
├─[ DELEGATION ]─ Payment-Agent (ACP+AP2)
│ token: #8b2c [scoped: $65.00]
│ action: applied payment policy
│
├─[ DELEGATION ]─ Shopping-Agent (A2A)
│ token: #3f9a [scoped: $75.00 max]
│ action: built cart, requested checkout
│
└─[ ROOT AUTH ]── user@gktk.in
device: verified terminal
time: 14:21:00 UTC
> DIAGNOSTIC: Shopping-Agent allowed $75, but Payment-Agent
token #8b2c was hard-capped at $65 during earlier session.
Conflict found at hop 2.
Right now, most handoffs are invisible. The agent acts. The human sees the outcome. The chain of delegation disappears.
This is the design work nobody is shipping… Making the chain legible at a glance, and fully reconstructable when something breaks. Whether the inspector is a human or another agent.
When things break down… UX for Negative AI Experiences
I keep thinking about how terrible we are at handling the negative spaces in AI interfaces. Like, we’ve all seen those “I’m sorry, I can’t do that” messages that explain nothing and solve nothing.
USER AGENT
│ │
│ REQUEST │
│──────────────────────▶ │
│ │
│ │
│ ┌────────────────┐ │
│◀─┤sorry, i can't │ │
│ │do that because │ │
│ │[generic reason]│ │
│ └────────────────┘ │
│ │
│ FRUSTRATION │
│──────────────────────▶ │
│ │
The deeper UX questions nobody’s solving:
- How do we show users what happened when context exceeds? It’s such an abstract concept but it makes their experience break completely.
- What’s the right visual metaphor for “I understood what you asked but I’m not allowed to do it”? right now it’s this weird deflection that makes users feel gaslit.
- How do we design graceful degradation for AI systems? They don’t degrade gradually… they just hit walls and stop.
- In a multi-agent chain, when something fails, which agent do you blame? how does that attribution surface to the user? right now it doesn’t.
┌──────────────────────────┐
│ NEGATIVE SPACE │
│ │
│ ┌─────┐ ┌────────┐ │
│ │LIMIT│─────>│BOUNDARY│ │
│ └─────┘ └────┬───┘ │
│ │ │
│ ▼ │
│ ┌─────────┐ │
│ │USER │ │
│ │RECOVERY │ │
│ └─────────┘ │
└──────────────────────────┘
Research Directions I’m Obsessed With Right Now…
Multiplayer human–AI, agents acting on the world, tool discovery at scale, text beyond prompting, ambient intelligence… and legibility: devtools optimize spans; humans need decisions and gates; agents need diffs and policy hooks.
+-------------------+
| my research zones |
+---------+---------+
|
+-----------------+----------------+
| | |
+---------v--------+ +------v------+ +-------v----------+
| human-ai teams | |simple agents| | text beyond |
+---------+--------+ +-----+-------+ | prompting |
| | +--------+---------+
+---------v--------+ +-----v-------+ |
| multiplayer | |ai that acts | |
| experiences | |on the world | +--------v---------+
+------------------+ +-------------+ | ubiquitous |
| intelligence |
+--------+---------+
|
+--------v---------+
| embodiment & |
| physical ai |
+------------------+
What’s Next for Me?
I’m looking for meaty projects for my freelance or a role - something that could justify pulling together a small team for 3+ months, ideally with public outcomes.
My experience shows that focused prototypes with tangible outputs lead to:
PROTOTYPE ────> NEW PRODUCT IDEAS
│
├─────> INTERESTING UX CHALLENGES
│
└─────> DRIVING TECHNICAL RESEARCH
Maybe it’s not a typical client project though? open to other approaches or even starting something completely new.
Honestly just putting this out there to see what resonates. this feels like such a fertile space right now and I’m itching to build something that matters, specifically on one of these UX problems:
- agent transparency and control… live plan surfaces + editable steps + audit logs
- multi‑agent coordination… sessions/routing ui, visible handoffs, conflict resolution
- physical interfaces… authority scopes, consent gates, kill‑switch semantics
- text as interactive medium
- agent legibility… the ambient/surface/interrupt stack — and what auditability means when the auditor might be an agent
note to self: reach out to folks in agentic product and multiplayer software — natural fit