fix(bounty): multi-store atomicity, saga settlement, reconciler framework by windoliver · Pull Request #254 · windoliver/grove

windoliver · 2026-04-15T01:36:04Z

Summary

Closes #240. Addresses the three codex-flagged correctness bugs in bounty operations and adds infrastructure to prevent recurrence.

Saga-based settlement: new pending_settlement pivot state ensures settleBountyOperation never enters a terminal state before capture() confirms. Resumable from any intermediate state (pending_settlement, completed) by both the operation and the background reconciler.
SweepReconciler framework: pluggable sweep strategies run on a 60s timer in all three runtimes (HTTP server, stdio MCP, HTTP MCP). Ships with BountyIndexSweep (dual-write index repair), SettlementSweep (resume stalled settlements), and HandoffSweep (detect orphaned contributions).
Claim renewal: same agent can extend a bounty claim lease without reopening the bounty, preventing long-running work from getting stranded on lease expiry.
State machine hardening: claimed → completed bypass removed — all settlement must go through pending_settlement. validateBountyTransition enforced in NexusBountyStore before every CAS write.
Input safety: amount > 0, non-empty title, pre-flight status checks, frozen fulfillment CID on retry, orphaned claim release on pre-commit failure.

What changed

Area	Files	Change
Saga settlement	`bounty.ts`, `bounty-logic.ts`, `bounty-store.ts`, both store impls	`pending_settlement` state, `beginSettlement()`, 3-step settle flow
Reconciler	`sweep-reconciler.ts`, `bounty-index-sweep.ts`, `settlement-sweep.ts`, `handoff-sweep.ts`	Framework + 3 strategies
Runtime wiring	`server/serve.ts`, `mcp/serve.ts`, `mcp/serve-http.ts`	Reconciler started + stopped in all entry points
Claim renewal	`operations/bounty.ts`	Same-agent renewal path in `claimBountyOperation`
Schema propagation	`schemas.ts`, `mcp/tools/bounties.ts`	`pending_settlement` added to all Zod enums
Tests	`bounty.test.ts`, `sweep-reconciler.test.ts`, `failing-bounty-store.ts`	120 tests: acceptance criteria, validation matrix, conflict scenarios, sweep strategies

Adversarial review

6 rounds of Codex adversarial review. 12 findings fixed (1 critical, 8 high, 1 medium). Key fixes:

Frozen fulfillment CID prevents non-deterministic settlements on retry
Process-local bounty cache removed — mutations always read fresh from VFS
completed bounties recoverable (capture already happened, just advance to settled)
SettlementSweep hard-fails on escrowed bounties without CreditsService
State machine enforces pending_settlement as mandatory pivot

Test plan

Follow-up

Production CreditsService (NexusPay): Production CreditsService (NexusPay integration) #253

…work (#240) Addresses the three codex-flagged correctness bugs in bounty operations and adds infrastructure to prevent recurrence: - Add pending_settlement saga pivot state so settleBountyOperation never enters a terminal state before capture() confirms - Add pre-flight status checks in claim/settle to prevent wasted side effects - Add input validation (amount > 0, non-empty title) at operation boundary - Add LRU doc cache + ETag-forwarding in NexusBountyStore to reduce VFS round-trips in multi-transition flows - Add SweepReconciler framework with pluggable strategies for periodic consistency repair - Add BountyIndexSweep (dual-write index repair), SettlementSweep (resume stalled pending_settlement), HandoffSweep (detect orphans) - Add FailingBountyStore test wrapper for partial-failure injection - Add lazy eviction of expired reservations in InMemoryCreditsService 115 tests pass across 4 test files including all 3 acceptance criteria from Issue #240.

1. Freeze fulfillment CID after saga pivot: when resuming a pending_settlement bounty, reject attempts to change the contribution CID. Prevents non-deterministic settlements. 2. Remove stale cache from transitionBounty: mutations always read fresh from VFS to get a valid ETag. Cache is still used for read-only getBounty() pre-flight checks. 3. Wire SweepReconciler into server startup: BountyIndexSweep and SettlementSweep now run on a 60s timer with graceful shutdown. Closes the "recovery not wired" gap.

1. Remove process-local bounty cache entirely: mutable objects must not be cached without cross-process invalidation. getBounty() now always reads fresh from VFS. Add validateBountyTransition() call in transitionBounty() to reject stale state before CAS write. 2. Extend settlement recovery to handle "completed" status: if capture succeeded and completeBounty committed but settleBounty failed, the operation and SettlementSweep can now resume from "completed" state. Prevents stranded post-capture bounties.

… 1 MEDIUM 1. [critical] SettlementSweep hard-fails when bounty has reservationId but no creditsService — prevents settling escrowed bounties without actually capturing funds. 2. [high] Remove claimed→completed from state machine — force all settlement through pending_settlement pivot. Update conformance tests and bounty-logic tests to use beginSettlement first. 3. [medium] BountyIndexSweep now calls repairIndex unconditionally for every bounty — cleans both missing current-status entries AND stale old-status markers.

1. Remove SettlementSweep from server startup: local runtime has no CreditsService, so the sweep would hard-fail on escrowed bounties. Only BountyIndexSweep is registered. Settlement sweep will be enabled when a production CreditsService is wired in. 2. Release orphaned claims on bounty transition failure: if claimBounty() fails, re-read the bounty and release the claim only if the bounty is still open (confirming the transition didn't commit). Post-commit failures keep the claim for consistency.

1. Add pending_settlement to Zod schemas in core/schemas.ts and mcp/tools/bounties.ts — prevents parsers from rejecting bounties in the new pivot state. 2. Re-enable SettlementSweep in server: it safely recovers non-escrowed bounties (no reservationId). Escrowed bounties log an error and wait for CreditsService. Update doc comment in bounty.ts lifecycle.

SettlementSweep: completed bounties have already captured — skip the creditsService requirement and just advance to settled. Only pending_settlement bounties need the capture step. Remaining findings (out of scope for #240): - Claim renewal/heartbeat path: pre-existing design gap, not introduced by this branch. Tracked separately. - Nexus MCP sweep wiring: requires architectural changes to MCP server startup. Tracked as follow-up integration work.

1. Same-agent claim renewal: claimBountyOperation now allows the current claim holder to extend their lease without reopening the bounty. Different agents are still rejected. Prevents long-running bounties from getting stranded when the claim lease expires. 2. Wire SweepReconciler into both MCP entry points: - serve.ts (stdio): starts BountyIndexSweep + SettlementSweep after store setup, stops on shutdown - serve-http.ts (HTTP): starts at process level using zone-scoped Nexus bounty store (not session-scoped), stops on shutdown The reconciler now runs in all three runtimes that can create bounties: HTTP server, stdio MCP, and HTTP MCP.

1. [high] Claim renewal with expired lease: detect if existing claim is expired and create a fresh claim ID instead of reusing the stale one. Rebinds the bounty to the new claim atomically. 2. [high] Remove SettlementSweep from MCP runtimes: no CreditsService available, escrowed bounties would fail every cycle. Only BountyIndexSweep registered. Settlement recovery deferred to #253. 3. [medium] BountyIndexSweep now detection-based: queries status-filtered lists to find actual drift, only calls repairIndex when missing. No more unconditional rewrite of every healthy bounty each cycle.

1. [high] Claim rebind after lease expiry: allow claimed→claimed self-transition so expired claim IDs get rotated to fresh ones. The bounty record is atomically rebound to the new claim. 2. [high] Re-enable SettlementSweep in MCP runtimes: completed bounties (already captured) can settle without CreditsService. Only pending_settlement+reservationId cases log errors. 3. [medium] repairIndex version-aware: re-reads with ETag before deleting stale markers. Skips cleanup if a concurrent transition changed the bounty between read and delete.

1. [high] Claim renewal checks lease validity (not just status): only reuse claimId if both status=active AND leaseExpiresAt > now. 2. [high] Compensation on rotated-claim rebind failure: release the orphaned new claim if bountyStore.claimBounty throws. 3. [medium] Remove sweep reconciler from stdio MCP: per-agent processes must not run zone-wide sweeps (N×load, CAS conflicts). Sweeps run only in HTTP server + HTTP MCP (singleton processes).

1. [high] Rebind compensation re-reads bounty: only releases the new claim if the bounty didn't commit the rebind (post-commit safety). 2. [medium] Serialized sweep cycles: in-flight guard prevents overlapping async cycles from contending with each other. 3. [medium] BountyIndexSweep stale-marker detection: known limitation — listBounties(status) filters stale entries before the sweep sees them. Full fix requires a raw index listing API (store-layer change). repairIndex handles cleanup when triggered by other paths.

All three "persistent" findings from the review loop are now fixed using existing Nexus VFS operations — no Nexus changes needed. 1. repairIndex race: check exists() before each stale marker delete, re-read the authoritative document right before deleting to confirm the bounty hasn't transitioned TO that status concurrently. 2. BountyIndexSweep stale-marker detection: new listIndexStatuses() method on NexusBountyStore checks which status index entries actually exist using client.exists(). Sweep now detects both missing current entries AND stale old-status entries in a single pass. 3. Added listIndexStatuses to BountyStore interface (optional) and FailingBountyStore wrapper.

…elds

windoliver added 14 commits April 14, 2026 16:52

fix(bounty): exactOptionalPropertyTypes compat for SweepReconciler fi…

127031d

…elds

windoliver merged commit b876ed2 into main Apr 15, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(bounty): multi-store atomicity, saga settlement, reconciler framework#254

fix(bounty): multi-store atomicity, saga settlement, reconciler framework#254
windoliver merged 14 commits intomainfrom
fix/bounty-atomicity-phase1

windoliver commented Apr 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

windoliver commented Apr 15, 2026

Summary

What changed

Adversarial review

Test plan

Follow-up

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant