Streaming tool calls without tears | ajo_agent

Streaming AI responses look magical in a demo. Then the user opens the app on their phone in a tunnel, switches networks twice, and Pull-To-Refreshes mid-stream. That's where most chat UIs fall apart. Here's the architecture we landed on after the third complete rewrite.

The shape of a robust chat stream

Three layers. Don't merge them.

Transport — WebSocket connection, reconnect logic, queueing.
Protocol — typed events that survive disconnects: message_start, token, tool_call, tool_result, pending_approval, message_end.
Reducer — pure state machine that applies events to the UI store.

The trick is: each layer can be replaced without rewriting the others.

Resist the urge to mix transport with state. If your useWebSocket hook is also setting React state, you've coupled retry logic to render logic. That's how stuck spinners are born.

Reconnect with idempotent resume

When the socket drops mid-message, the server needs to know what the client already saw. We send a last_event_id on reconnect; the server replays from there. The client reducer is idempotent on duplicate events — applying the same token event twice is a no-op.

// Reducer is the source of truth. Transport just drops events into it.
function reduce(state: ChatState, event: ChatEvent): ChatState {
  if (event.id <= state.lastAppliedEventId) return state; // idempotent
  switch (event.type) {
    case "token":
      return { ...state, draftMessage: state.draftMessage + event.text };
    case "tool_call":
      return { ...state, toolCalls: [...state.toolCalls, event.call] };
    // ...
  }
}

Tool calls are first-class UI

Don't hide tool calls behind "thinking…". Show what the agent did, with status: pending, running, success, error. Collapsed by default for success, expanded for the others.

This serves two purposes:

Trust. Users see the agent's work. Citations and tool transparency are the difference between magic and a black box.
Debugging. When something goes wrong, the tool log is the first place to look.

Human-in-the-loop without UX whiplash

Some tools should wait for approval (writing files, sending messages, charging cards). The pattern: server sends pending_approval, the client renders an inline confirmation card, the user approves or rejects, the agent resumes via a resume_decisions event.

The key UX move: don't modal-dialog this. The conversation is the workflow. Approval should feel like a paragraph break, not a popup interrupting flow.

What we removed

Auto-scroll on every token. Brutal on long messages — users can't read what's scrolling away. Only scroll if they were already near the bottom.
Optimistic message UI for the assistant. The user's message is optimistic; the assistant's isn't. Wait for message_start.
Streaming markdown rendering on every token. Re-rendering markdown 60 times a second is expensive. Render plain text while streaming, swap to MD on message_end.

What's left

Mobile sidebar drawer state machine, accessibility for the streaming cursor, and a way to "continue" a truncated response. We'll keep iterating — but the core, where user input enters and structured events come back, is finally stable.