Background
While reviewing PR #239, codex flagged three correctness bugs in an unlanded bounty.ts compensation refactor that was sitting in the worktree. The refactor adds try/catch compensation for createBountyOperation, claimBountyOperation, and settleBountyOperation, but it assumes each store operation is atomic. It isn't, and the compensation logic can create worse states than it was meant to fix.
These findings were confirmed across three independent codex adversarial-review rounds and are the reason the bounty changes were not landed in #239.
Findings
1. [CRITICAL] settleBounty commits terminal state before payment is captured
`src/core/operations/bounty.ts:319-333`
The reordered flow is:
```typescript
const completed = await deps.bountyStore.completeBounty(...);
const settled = await deps.bountyStore.settleBounty(completed.bountyId);
// Capture payment AFTER state is persisted
if (deps.creditsService && bounty.reservationId && bounty.claimedBy) {
await deps.creditsService.capture(bounty.reservationId, {...});
}
```
The bounty is now `settled` (terminal) before `creditsService.capture()` runs. `capture()` is explicitly allowed to fail for missing / voided / expired reservations and upstream payment errors (see `src/core/in-memory-credits.ts:133-168`). When capture throws, the bounty is already in a terminal state, so the state machine cannot retry settlement, and there is no bounty-payment reconciler in the repository to repair it.
User-visible consequence: a bounty recorded as paid even though the escrow was never captured or transferred to the worker. Silent, unrecoverable.
2. [HIGH] void(reservationId) on any createBounty error can leave persisted bounty without escrow
`src/core/operations/bounty.ts:156-176`
```typescript
try {
const result = await deps.bountyStore.createBounty(bounty);
return ok({...});
} catch (storeErr) {
if (reservationId !== undefined && deps.creditsService !== undefined) {
try { await deps.creditsService.void(reservationId); } catch {}
}
return fromGroveError(storeErr);
}
```
The catch assumes `createBounty()` failed before persistence. That's not true for `NexusBountyStore`: it writes the bounty document first (`nexus-bounty-store.ts:80-102`) and only then writes the status index. If the status-index write throws, the bounty document is already committed, but this handler returns the creator's credits while the open bounty is persisted with the now-empty `reservationId`.
Consequence: an "open" bounty exists in Nexus with no escrow backing it. The creator can re-spend the refunded credits. Later claimants will find the bounty and try to work against it; settlement will hit a voided reservation.
3. [HIGH] release(claim.claimId) on any claimBounty error can strand bounty in claimed with no live claim
`src/core/operations/bounty.ts:258-269`
```typescript
let claimed;
try {
claimed = await deps.bountyStore.claimBounty(input.bountyId, agent, claim.claimId);
} catch (bountyErr) {
try { await deps.claimStore.release(claim.claimId); } catch {}
return fromGroveError(bountyErr);
}
```
Same shape as finding 2. `NexusBountyStore.transitionBounty()` CAS-writes the bounty document first, then writes the status index (`nexus-bounty-store.ts:263-275`). A late failure leaves the bounty already updated to `claimed` in storage. Releasing the claim here leaves `claimId` and `claimedBy` on the bounty pointing at a released claim, which blocks other claimants without giving the system a lease to manage or recover the work.
Consequence: a bounty stuck in `claimed` with no active claim, invisible to claim-expiry sweeps.
Design options
The reviewer's recommendations from rounds 1 and 3:
Do not transition to terminal `settled` until capture succeeds, or introduce a durable `pending_settlement`/outbox flow with retries and reconciliation.
Only void on failures proven to be pre-persistence, or make bounty creation atomic / confirm persistence before compensation. A minimal mitigation is to re-read the bounty by ID after a store error and skip `void()` if it already exists.
Three paths forward, in rough order of complexity:
(a) Reconciler + pending_settlement state (correct, most work)
- Add a `pending_settlement` bounty status between `claimed` and `settled`.
- `settleBountyOperation` transitions `claimed → pending_settlement`, then captures, then transitions `pending_settlement → settled`.
- Add a background reconciler that sweeps `pending_settlement` bounties and retries `capture()` with exponential backoff. Compensating transaction when the reservation is dead.
- Add post-commit confirmation in the create/claim compensation paths (re-read the bounty record and only compensate if the store state is pre-commit).
Rough size: 200-500 lines + tests + another codex review cycle. Probably 1-2 days.
(b) CAS confirmation only, no reconciler
- Keep the existing state machine.
- In each compensation catch, re-read the bounty by id (or check the CAS etag) to distinguish pre-commit from post-commit failures.
- Skip compensation on post-commit failures; surface them with a clear error instead.
- Leaves the settle-before-pay race unfixed unless settle is reordered back to "capture first, then persist".
Rough size: 100-200 lines + tests. Half a day.
(c) Revert the compensation refactor entirely (smallest)
- Drop the try/catch compensation in `createBountyOperation`, `claimBountyOperation`, `settleBountyOperation`.
- Rely on reservation auto-expiry to reclaim dead escrow and on claim lease timeouts to reclaim dead claims.
- Keep the cosmetic changes (`MISSING_BOUNTY_STORE` const, pre-flight status check on claim).
- Accept that failed bounty creates leak "open" state until the reservation deadline and failed claims leak "claimed" state until the lease expires.
Rough size: 50-100 lines. An hour.
Recommendation
Start with (c) to stop the bleed, ship it, then do (a) as a proper saga-log project with its own RFC and test harness. (b) is a middle ground but doesn't fix finding 1 (settle-before-pay), which is the critical one.
References
Current state of the code
The buggy refactor is sitting uncommitted in the `ancient-nibbling-anchor` worktree on files:
- `src/core/operations/bounty.ts` — compensation logic + reordered settlement
- `src/core/operations/bounty.test.ts` — tests for the new compensation paths
Run `git stash` / `git diff` on that branch to see the exact 359-line diff.
Background
While reviewing PR #239, codex flagged three correctness bugs in an unlanded
bounty.tscompensation refactor that was sitting in the worktree. The refactor adds try/catch compensation forcreateBountyOperation,claimBountyOperation, andsettleBountyOperation, but it assumes each store operation is atomic. It isn't, and the compensation logic can create worse states than it was meant to fix.These findings were confirmed across three independent codex adversarial-review rounds and are the reason the bounty changes were not landed in #239.
Findings
1. [CRITICAL]
settleBountycommits terminal state before payment is captured`src/core/operations/bounty.ts:319-333`
The reordered flow is:
```typescript
const completed = await deps.bountyStore.completeBounty(...);
const settled = await deps.bountyStore.settleBounty(completed.bountyId);
// Capture payment AFTER state is persisted
if (deps.creditsService && bounty.reservationId && bounty.claimedBy) {
await deps.creditsService.capture(bounty.reservationId, {...});
}
```
The bounty is now `settled` (terminal) before `creditsService.capture()` runs. `capture()` is explicitly allowed to fail for missing / voided / expired reservations and upstream payment errors (see `src/core/in-memory-credits.ts:133-168`). When capture throws, the bounty is already in a terminal state, so the state machine cannot retry settlement, and there is no bounty-payment reconciler in the repository to repair it.
User-visible consequence: a bounty recorded as paid even though the escrow was never captured or transferred to the worker. Silent, unrecoverable.
2. [HIGH]
void(reservationId)on anycreateBountyerror can leave persisted bounty without escrow`src/core/operations/bounty.ts:156-176`
```typescript
try {
const result = await deps.bountyStore.createBounty(bounty);
return ok({...});
} catch (storeErr) {
if (reservationId !== undefined && deps.creditsService !== undefined) {
try { await deps.creditsService.void(reservationId); } catch {}
}
return fromGroveError(storeErr);
}
```
The catch assumes `createBounty()` failed before persistence. That's not true for `NexusBountyStore`: it writes the bounty document first (`nexus-bounty-store.ts:80-102`) and only then writes the status index. If the status-index write throws, the bounty document is already committed, but this handler returns the creator's credits while the open bounty is persisted with the now-empty `reservationId`.
Consequence: an "open" bounty exists in Nexus with no escrow backing it. The creator can re-spend the refunded credits. Later claimants will find the bounty and try to work against it; settlement will hit a voided reservation.
3. [HIGH]
release(claim.claimId)on anyclaimBountyerror can strand bounty inclaimedwith no live claim`src/core/operations/bounty.ts:258-269`
```typescript
let claimed;
try {
claimed = await deps.bountyStore.claimBounty(input.bountyId, agent, claim.claimId);
} catch (bountyErr) {
try { await deps.claimStore.release(claim.claimId); } catch {}
return fromGroveError(bountyErr);
}
```
Same shape as finding 2. `NexusBountyStore.transitionBounty()` CAS-writes the bounty document first, then writes the status index (`nexus-bounty-store.ts:263-275`). A late failure leaves the bounty already updated to `claimed` in storage. Releasing the claim here leaves `claimId` and `claimedBy` on the bounty pointing at a released claim, which blocks other claimants without giving the system a lease to manage or recover the work.
Consequence: a bounty stuck in `claimed` with no active claim, invisible to claim-expiry sweeps.
Design options
The reviewer's recommendations from rounds 1 and 3:
Three paths forward, in rough order of complexity:
(a) Reconciler + pending_settlement state (correct, most work)
Rough size: 200-500 lines + tests + another codex review cycle. Probably 1-2 days.
(b) CAS confirmation only, no reconciler
Rough size: 100-200 lines + tests. Half a day.
(c) Revert the compensation refactor entirely (smallest)
Rough size: 50-100 lines. An hour.
Recommendation
Start with (c) to stop the bleed, ship it, then do (a) as a proper saga-log project with its own RFC and test harness. (b) is a middle ground but doesn't fix finding 1 (settle-before-pay), which is the critical one.
References
Current state of the code
The buggy refactor is sitting uncommitted in the `ancient-nibbling-anchor` worktree on files:
Run `git stash` / `git diff` on that branch to see the exact 359-line diff.