Skip to content

feat(F.5): pool-respawn-on-promote saga#634

Merged
a5af merged 13 commits intomainfrom
agenta/step6-f5-pool-respawn-saga
May 1, 2026
Merged

feat(F.5): pool-respawn-on-promote saga#634
a5af merged 13 commits intomainfrom
agenta/step6-f5-pool-respawn-saga

Conversation

@AgentA-asaf
Copy link
Copy Markdown
Contributor

Summary

Step 6 PR 1 of the architecture-completeness plan: F.5 pool-respawn-on-promote saga. First real consumer of the launcher-side saga coordinator (previously a framework-stub from E.1a). Formalizes today's implicit "promote → refill" flow as an explicit cross-process saga so the renderer sees SagaStarted/SagaCompleted brackets and can buffer related events atomically.

Closes (partial): reducer-architecture-gaps §3 cross-process saga.

Spec: docs/specs/SPEC_PHASE_F_HOST_REDUCER_2026-05-01.md §7.1.
Plan: docs/retro/next-steps-architecture-completeness-2026-05-01.md step 6.

What changed

  • New saga: agentmux-launcher/src/saga/pool_respawn.rs (~300 LOC including tests).
  • Reorganised agentmux-launcher/src/saga.rssaga/mod.rs to host the submodule.
  • New wire variants in agentmux-common::ipc:
    • Command::ReportPoolWindowPromoted { label } (host→launcher).
    • Command::SpawnPoolWindow (launcher→host, framework target — see caveat).
    • Event::PoolWindowPromoted { label, version } (launcher→subscribers).
  • Launcher coordinator now actively starts sagas on Event::PoolWindowPromoted, routes events into in-flight sagas via Saga::on_event, and emits SagaStarted / SagaCompleted / SagaFailed brackets.
  • Host-side: agentmux-cef::commands::window_pool::promote_pool_window calls report_pool_window_promoted between the existing report_pool_window_removed + report_window_opened pair (only on the validated-HWND path — pre-promote destroy paths don't trigger the saga).
  • Compensation: none — failure to refill logs and retries on next promote (per F-spec §7.1).
  • Misrouted-command map gains arms for the new variants (consistent with srv-pipe misroute pattern).
  • Test-only extract_version helper updated to match the now-larger Event variant set (was already missing TabMoved/BlockMoved/FocusedNodeChanged/MagnifiedNodeChanged from prior PRs).

Cross-process dispatch caveat (scoped down per F-spec §7.1)

The saga issues SagaAction::IssueCmd { target: Host, cmd: SpawnPoolWindow } for state-machine completeness. The launcher coordinator currently logs the dispatch — there is no launcher→host command pipe yet (host's launcher_ipc.rs is wired host→launcher only). The saga relies on the host's existing implicit spawn_pool_window call inside promote_pool_window to produce the matching Event::PoolWindowAdded. F.6 (or the cross-process-dispatch follow-up) replaces the log with a real wire-level send. This is acceptable per the F-spec §7.1 final paragraph: "If the cross-process plumbing is too heavy a lift for one PR, scope down: implement the saga shape + the launcher-side coordinator entry, even if the actual cross-process dispatch is initially in-process."

What's NOT in this PR

Test plan

  • cargo check --workspace clean (only pre-existing warnings).
  • cargo test -p agentmux-launcher79/79 passing (was 69; +10 saga tests).
    • 5 unit tests in pool_respawn::tests (state machine: start, complete, ignore unrelated, first-wins, end-to-end coordinator integration).
    • 5 in saga::tests (trigger matching, lifecycle-event filter, monotonic id, drain-without-trigger).
  • Smoke: tear off a tab in a real build, verify SagaStarted name=pool_respawn_on_promote + SagaCompleted brackets visible in --diag wrr output around the implicit pool refill.

🤖 Generated with Claude Code

@a5af
Copy link
Copy Markdown
Member

a5af commented May 1, 2026

@codex review

1 similar comment
@a5af
Copy link
Copy Markdown
Member

a5af commented May 1, 2026

@codex review

Copy link
Copy Markdown

@reagentx-workflow reagentx-workflow Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ReAgent Diagnostics
Field Value
ReAgent Version 5.12.11
Trigger PR opened
Project Context CLAUDE.md loaded
Model claude-opus-4-6
Effort high
Ref Repos Disabled
Merge Analysis Clean
Review Time 175.9s
Timestamp 2026-05-01T08:25:51Z
Repository agentmuxai/agentmux
PR #634

LGTM

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f86be0d444

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +172 to +177
(Phase::WaitingForRefill, Event::PoolWindowAdded { label, .. }) => {
// The refill produced a new label. Record + complete.
// (Spec § 7.1: Step 3 is just `Done`; no further
// dispatch.)
self.refilled_label = Some(label.clone());
SagaAction::Done
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Correlate respawn completion to the triggering promote

This saga marks itself Done on the first PoolWindowAdded it sees, but run_coordinator broadcasts every event to every in-flight saga (agentmux-launcher/src/saga/mod.rs), so concurrent promotes are mis-correlated: if two promotes are in flight, the first refill event will complete both sagas. That produces an early SagaCompleted for at least one promote, which breaks the intended buffering bracket semantics and leaves the later refill event unbracketed.

Useful? React with 👍 / 👎.

@a5af
Copy link
Copy Markdown
Member

a5af commented May 1, 2026

@codex review

Copy link
Copy Markdown

@reagentx-workflow reagentx-workflow Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ReAgent Diagnostics
Field Value
ReAgent Version 5.12.11
Trigger Re-review (new commits)
Project Context CLAUDE.md loaded
Model claude-opus-4-6
Effort high
Ref Repos Disabled
Merge Analysis Clean
Review Time 54.7s
Timestamp 2026-05-01T08:40:06Z
Repository agentmuxai/agentmux
PR #634

LGTM

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 11a7ab12a8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

// before refill actually finished — the renderer's buffered
// state would flush prematurely.
match (&self.phase, event) {
(Phase::WaitingForRefill, Event::PoolWindowAdded { label, .. }) => {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Correlate pool refill event before marking saga complete

PoolRespawn transitions to Done on the first PoolWindowAdded event it sees, without checking that the add belongs to this promote. Because the coordinator broadcasts each event to every in-flight saga (run_coordinator fan-out), overlapping promotes (or any unrelated pool add emitted while a promote saga is waiting) can cause one add event to complete multiple sagas early, leaving the real later refill outside the expected SagaStarted/SagaCompleted bracket for that promote.

Useful? React with 👍 / 👎.

@a5af
Copy link
Copy Markdown
Member

a5af commented May 1, 2026

@codex review

Copy link
Copy Markdown

@reagentx-workflow reagentx-workflow Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ReAgent Diagnostics
Field Value
ReAgent Version 5.12.11
Trigger Re-review (new commits)
Project Context CLAUDE.md loaded
Model claude-opus-4-6
Effort high
Ref Repos Disabled
Merge Analysis Clean
Review Time 51.5s
Timestamp 2026-05-01T08:51:31Z
Repository agentmuxai/agentmux
PR #634

LGTM

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 995ad30c72

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +380 to +383
crate::log(&format!(
"[saga] dropping {} trigger — same-kind saga already in flight (codex P1 #634 serialization fallback)",
new_kind,
));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Do not permanently drop promote triggers after one stalled saga

This branch drops every new PoolWindowPromoted trigger whenever one pool_respawn_on_promote saga is still in flight, but PoolRespawn has no timeout/failure path for missing PoolWindowAdded and can wait indefinitely. In that case (e.g., refill failure or a missed terminal event), the first stalled saga blocks all future promote sagas for the rest of the process lifetime, so retries on later promotes never happen and renderer saga brackets are permanently suppressed.

Useful? React with 👍 / 👎.

@a5af
Copy link
Copy Markdown
Member

a5af commented May 1, 2026

@codex review

Copy link
Copy Markdown

@reagentx-workflow reagentx-workflow Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ReAgent Diagnostics
Field Value
ReAgent Version 5.12.11
Trigger Re-review (new commits)
Project Context CLAUDE.md loaded
Model claude-opus-4-6
Effort high
Ref Repos Disabled
Merge Analysis Clean
Review Time 159.3s
Timestamp 2026-05-01T11:39:45Z
Repository agentmuxai/agentmux
PR #634

LGTM

AgentA and others added 13 commits May 1, 2026 04:40
First real consumer of the launcher-side saga coordinator (previously
a framework-stub from E.1a). Formalizes today's implicit
"promote → refill" flow as an explicit saga so the renderer sees
SagaStarted/SagaCompleted brackets and can buffer related events
atomically.

Spec: docs/specs/SPEC_PHASE_F_HOST_REDUCER_2026-05-01.md §7.1.

What changed
- New saga: agentmux-launcher/src/saga/pool_respawn.rs.
- New wire: Command::ReportPoolWindowPromoted, Command::SpawnPoolWindow,
  Event::PoolWindowPromoted.
- saga.rs reorganised into saga/mod.rs to host pool_respawn submodule.
- Launcher coordinator now actively starts sagas on trigger events
  (Event::PoolWindowPromoted), routes events to in-flight sagas, and
  emits SagaStarted/SagaCompleted/SagaFailed on the bus.
- Host's promote_pool_window emits the new ReportPoolWindowPromoted
  between the existing remove + open pair so the launcher's reducer
  has unambiguous evidence of a promote (vs pre-promote destroy).
- Launcher reducer's misrouted-command map gains arms for the new
  variants (consistent with srv-pipe misroute pattern).
- Test-only extract_version helper updated to match the now-larger
  Event variant set (was already missing TabMoved/BlockMoved/
  FocusedNodeChanged/MagnifiedNodeChanged from prior PRs).

Cross-process dispatch caveat
The saga issues SagaAction::IssueCmd { target: Host, cmd: SpawnPoolWindow }
for state-machine completeness. The launcher coordinator currently
LOGS the dispatch — there is no launcher→host command pipe yet
(host's launcher_ipc.rs is wired host→launcher only). The saga relies
on the host's existing implicit spawn_pool_window call inside
promote_pool_window to produce the matching Event::PoolWindowAdded.
F.6 (or the cross-process-dispatch follow-up) replaces the log with a
real wire-level send. Per F-spec §7.1's scope-down guidance.

Tests
cargo test -p agentmux-launcher: 79/79 passing (was 69; +10 saga tests).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…n limitation

Codex caught that the pool_respawn saga relies on the coordinator's
broadcast-to-all routing, so with two promotes in flight the first
PoolWindowAdded completes BOTH sagas — early SagaCompleted for the
second promote, later refill event left unbracketed.

Proper fix requires either FIFO routing in run_coordinator or
per-saga sequence-number correlation. Both are non-trivial and depend
on the cross-process dispatch landing first (so the saga's IssueCmd
is actually causal). Closed in F.6/F.7 alongside the launcher→host
command pipe.

This PR ships the limitation observable:
- Doc comment in pool_respawn.rs detailing the limitation, when it
  triggers, the renderer-visible consequence, and the closure plan.
- Runtime WARN log in run_coordinator's saga insertion path when
  starting a saga of a kind already in flight. Operators can spot
  the pattern in --diag wrr / launcher logs.

Reducer + SQLite state remain correct under all paths — the
limitation is a renderer-side bracketing inconsistency only.

79/79 launcher tests pass.
Round 2's doc-as-mitigation didn't satisfy codex (re-flagged round 3).
Implementing the actual fix as a serialization gate: the coordinator
drops a PoolWindowPromoted trigger if a same-kind saga is already in
flight. Falls back to legacy pre-saga semantics for the second
concurrent promote (no bracket emitted) while the first proceeds
correctly.

Trade-off vs full FIFO routing / per-event saga_id correlation:
- Pros: small, local change; preserves first-promote bracket; reducer
  + SQLite state correct under all paths; no architecture refactor.
- Cons: rapid back-to-back tear-offs lose the bracket on the 2nd
  promote (drops to pre-saga implicit flow).

Proper fix (FIFO event routing or per-event saga_id correlation)
deferred to F.6/F.7 alongside cross-process dispatch.

79/79 launcher tests pass.
Round 3's silent-drop-on-conflict had a permanent-deadlock failure
mode: a stalled saga (refill never arrived, e.g. host-side spawn
failure) would block ALL future promote sagas for the rest of the
process lifetime — renderer brackets permanently suppressed.

Fix: evict-and-replace. New PoolWindowPromoted trigger evicts any
in-flight pool_respawn saga (emitting SagaFailed for the evicted
saga + removing from registry) before starting the new saga.

Trade-offs:
- Pros: no permanent block; each new promote gets a fresh bracket;
  reducer + SQLite stay correct under all paths.
- Cons: when two healthy promotes happen in quick succession, the
  first promote's bracket is cut short with SagaFailed (the late
  refill event for that promote completes the *new* saga instead).
  Renderer-visible: one premature SagaFailed + one correct
  SagaCompleted.

Proper FIFO routing or per-event saga_id correlation deferred to
F.6/F.7 alongside cross-process dispatch landing.

79/79 launcher tests pass.
@AgentA-asaf AgentA-asaf force-pushed the agenta/step6-f5-pool-respawn-saga branch from 424689c to d2034bf Compare May 1, 2026 11:41
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 424689c158

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +191 to +196
(Phase::WaitingForRefill, Event::PoolWindowAdded { label, .. }) => {
// The refill produced a new label. Record + complete.
// (Spec § 7.1: Step 3 is just `Done`; no further
// dispatch.)
self.refilled_label = Some(label.clone());
SagaAction::Done
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Correlate pool-respawn to the actual refill event

Complete the saga only after identifying the refill caused by this promote, not on any PoolWindowAdded. As written, the first PoolWindowAdded after start ends the saga, so a background add (for example, startup/top-up adds that can still be in flight while a promote happens) can emit SagaCompleted before the promote’s own refill arrives. That produces premature bracket closure in subscribers and leaves the real refill event outside the intended saga boundary.

Useful? React with 👍 / 👎.

@a5af
Copy link
Copy Markdown
Member

a5af commented May 1, 2026

@codex review

@a5af a5af merged commit f67a76a into main May 1, 2026
Copy link
Copy Markdown

@reagentx-workflow reagentx-workflow Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ReAgent Diagnostics
Field Value
ReAgent Version 5.12.11
Trigger Re-review (new commits)
Project Context CLAUDE.md loaded
Model claude-opus-4-6
Effort high
Ref Repos Disabled
Merge Analysis Clean
Review Time 60.7s
Timestamp 2026-05-01T11:42:30Z
Repository agentmuxai/agentmux
PR #634

LGTM

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Another round soon, please!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants