Skip to content

perf: PTY spawn serializes on concurrent pane creation (~10ms × N stall) #121

@AgentU-asaf

Description

@AgentU-asaf

Summary

Opening multiple terminal panes quickly produces a noticeable UI stall. Each new pane triggers a controllerresync RPC that takes ~10ms on the backend (PTY + shell spawn). When N panes are opened concurrently, these calls serialize, producing an N × ~10ms stall before any terminal becomes interactive.

Timing logs from v0.31.120 (4 panes opened in a burst):

Pane 1: 12.73 ms  (14:23:33.588)
Pane 2:  9.66 ms  (14:23:33.708)
Pane 3:  9.43 ms  (14:23:33.845)
Pane 4:  9.83 ms  (14:23:35.963)
──────────────────────────────────
Total:  ~41.65 ms serialized — perceived as sluggishness

Root Cause

1. spawn_command() runs inline on the async executor

In shell.rs::start(), pair.slave.spawn_command(cmd) (i.e., fork() + exec()) is a blocking OS call taking 3–10ms. It runs inline in a Tokio async task, violating the <100µs rule. This starves other tasks sharing the executor.

2. Frontend awaits the full spawn before displaying the pane

In termwrap.ts::init():

// Subscribe to PTY data ← happens BEFORE resync
this.mainFileSubject.subscribe(this.handleNewFileSubjectData.bind(this));
// ...
await this.resyncController("init"); // ← awaits full fork+exec (~10ms)

The PTY data subscription is already registered before resyncController is called, so the pane will receive output regardless. For a new (empty) pane there is no correctness reason to await. The await was the only thing serializing the four concurrent open operations from the frontend side.

3. fork() page table cost scales with parent VSS

As the app grows (more modules loaded, more memory mapped), fork() latency increases because Linux must copy the parent's page table even with CoW. Research shows this can reach 10–50ms in large processes (Rust issue #87764, fork latency analysis).


Proposed Fixes

Fix A — Drop await on resyncController for new panes (1 line, highest ROI)

frontend/app/view/term/termwrap.ts

// Before:
await this.resyncController("init");

// After:
this.resyncController("init"); // fire-and-forget; subscription already active

All N panes now fire their resyncs concurrently. The backend processes them in parallel (each on its own Tokio task). Total perceived latency drops from N×10ms → ~10ms regardless of N.

Risk: Low. Errors are already caught and logged inside resyncController. Other callers (reconnect, forcerestart) should retain await.

Fix B — Wrap spawn_command in spawn_blocking (~20 lines)

agentmuxsrv-rs/src/backend/blockcontroller/shell.rs

// Current — blocks async executor for ~10ms:
let mut child = pair.slave.spawn_command(cmd)?;

// Proposed — offload fork+exec to blocking thread pool:
let slave = pair.slave;
let mut child = tokio::task::spawn_blocking(move || slave.spawn_command(cmd))
    .await
    .map_err(|e| format!("join error: {e}"))??;

Frees the Tokio executor during the fork. Multiple concurrent controllerresync handlers can all reach spawn_blocking simultaneously; the blocking pool runs them concurrently (default limit: ~500 threads).

Fix C — --norc --noprofile for cmd controller (non-interactive shells)

Agent-driven (cmd type) shells do not need rcfile loading. On a typical developer machine, .bashrc takes 40–500ms to source. Passing --norc --noprofile reduces spawn time to the ~3ms bare floor.

Should be opt-in via cmd:bare: true block meta initially.

Fix D — Pre-forked PTY pool (future, highest complexity)

Pre-fork N lightweight placeholder processes at startup before the parent VSS grows. Hand them out on demand — each exec() replaces the placeholder. Replenish pool asynchronously. Brings perceived spawn latency to ~0ms.

No terminal multiplexer (tmux, WezTerm, Zellij) currently uses this technique. Deferred until A+B+C are validated.


Research Basis

Full spec with all research, code examples, and success criteria: perf-spec-pty-spawn-latency.md in repo root.


Success Criteria

  • 4 panes opened simultaneously: all interactive within ≤ 15ms wall time (vs current ~42ms serial)
  • No regression on single-pane open
  • Session reconnect (forcerestart: true) continues to await completion
  • controllerresync handler never blocks Tokio async executor >100µs inline

Implementation Order

Priority Fix Effort
1 Drop await in termwrap.ts 1 line
2 spawn_blocking for spawn_command ~20 lines
3 --norc for cmd controller ~10 lines
4 Pre-fork pool ~200 lines, new module

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions