Skip to content

bug(web-ui): in-progress response not restored after laptop sleep / Chrome backgrounding (manual refresh required) #132

@lidge-jun

Description

@lidge-jun

Symptom

When the laptop lid is closed for a while, or the Chrome tab/window is backgrounded for an extended period, returning to the Web UI leaves it in a broken state during an in-progress response:

  • Status still shows running (or stays stuck mid-stream).
  • The user's most recent message (sent before the suspend) is missing from the chat history.
  • The currently-running tool / streaming text is not restored.
  • Only a manual page refresh recovers the state. Subsequent WS events arrive normally.

Repro

  1. Send a long-running prompt that takes ~30s+ (e.g. a multi-step orchestration or a heavy boss reply).
  2. Close the laptop lid (or switch Chrome to a different app and leave it idle for several minutes).
  3. Wait until the OS suspends or Chrome heavily throttles the tab.
  4. Resume — focus the Web UI tab again.

Expected: the in-progress response and its tool log resume from the snapshot; latest user message bubble is visible.
Actual: chat is frozen on pre-suspend state, latest user bubble missing, tool not progressing, requires F5.

Suspected root cause (client side)

public/js/ws.ts registers visibility/focus/pageshow restore hooks that call syncOrchestrateSnapshot(reason) without hydrateRun: true, so:

  • hydrateActiveRun(snap.activeRun) is not invoked on re-focus → in-progress agent bubble & tool log are not rebuilt.
  • loadMessages() is not called → the latest user message (which was queued/streamed in just before suspend) is not re-fetched from /api/messages.

Both paths only run inside state.ws.onopen (i.e. only after a real WebSocket reconnect). If the OS/browser keeps the WS object in OPEN state during the suspend (no onclose fires), the restore hook silently does the light sync only.

Additionally, there is no WS ping/pong heartbeat (server server.ts WebSocketServer and src/core/bus.ts have no ping/pong/isAlive/terminate logic). After a long suspend the TCP socket can be silently dead while the client still believes the WS is open, so reconnect-driven recovery never fires.

Server-side data is already there

/api/orchestrate/snapshot already returns activeRun populated by getLiveRun(scope) in src/agent/live-run-state.ts — text, toolLog, cli, running flag — so a client-side fix can rehydrate without protocol changes.

Proposed direction (to be confirmed in devlog _plan)

  1. On visibilitychange/focus/pageshow/resume, additionally call loadMessages() and hydrateActiveRun (i.e. pass hydrateRun: true and reload history) when the page was hidden long enough.
  2. Add WS keepalive: server-side ping interval + client pong handler with stale-socket termination, so a dead WS triggers onclose and the existing reconnect path runs.
  3. Detect long visibility gap (Date.now() - lastVisibleAt > N) and force-reset WS via state.ws.close() to deterministically take the reconnect path.

Out of scope

  • Tool-call / orchestration semantics on the server.
  • Boss/employee dispatch behavior.
  • Manager dashboard refresh (already separate).

Devlog plan to follow under devlog/_plan/260428_web_ui_resume_recovery/.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions