bounty multi-store atomicity + saga (absorbs #227): compensation bugs, handoff reconciler, settle-before-pay

## Background

While reviewing PR #239, codex flagged three correctness bugs in an **unlanded** `bounty.ts` compensation refactor that was sitting in the worktree. The refactor adds try/catch compensation for `createBountyOperation`, `claimBountyOperation`, and `settleBountyOperation`, but it assumes each store operation is atomic. It isn't, and the compensation logic can create worse states than it was meant to fix.

These findings were confirmed across three independent codex adversarial-review rounds and are the reason the bounty changes were **not** landed in #239.

## Findings

### 1. [CRITICAL] `settleBounty` commits terminal state before payment is captured

\`src/core/operations/bounty.ts:319-333\`

The reordered flow is:

\`\`\`typescript
const completed = await deps.bountyStore.completeBounty(...);
const settled = await deps.bountyStore.settleBounty(completed.bountyId);

// Capture payment AFTER state is persisted
if (deps.creditsService && bounty.reservationId && bounty.claimedBy) {
  await deps.creditsService.capture(bounty.reservationId, {...});
}
\`\`\`

The bounty is now \`settled\` (terminal) before \`creditsService.capture()\` runs. \`capture()\` is explicitly allowed to fail for missing / voided / expired reservations and upstream payment errors (see \`src/core/in-memory-credits.ts:133-168\`). When capture throws, the bounty is already in a terminal state, so the state machine cannot retry settlement, and there is **no bounty-payment reconciler** in the repository to repair it.

**User-visible consequence**: a bounty recorded as paid even though the escrow was never captured or transferred to the worker. Silent, unrecoverable.

### 2. [HIGH] `void(reservationId)` on any `createBounty` error can leave persisted bounty without escrow

\`src/core/operations/bounty.ts:156-176\`

\`\`\`typescript
try {
  const result = await deps.bountyStore.createBounty(bounty);
  return ok({...});
} catch (storeErr) {
  if (reservationId !== undefined && deps.creditsService !== undefined) {
    try { await deps.creditsService.void(reservationId); } catch {}
  }
  return fromGroveError(storeErr);
}
\`\`\`

The catch assumes \`createBounty()\` failed **before** persistence. That's not true for \`NexusBountyStore\`: it writes the bounty document first (\`nexus-bounty-store.ts:80-102\`) and only then writes the status index. If the status-index write throws, the bounty document is already committed, but this handler returns the creator's credits while the open bounty is persisted with the now-empty \`reservationId\`.

**Consequence**: an \"open\" bounty exists in Nexus with no escrow backing it. The creator can re-spend the refunded credits. Later claimants will find the bounty and try to work against it; settlement will hit a voided reservation.

### 3. [HIGH] `release(claim.claimId)` on any `claimBounty` error can strand bounty in `claimed` with no live claim

\`src/core/operations/bounty.ts:258-269\`

\`\`\`typescript
let claimed;
try {
  claimed = await deps.bountyStore.claimBounty(input.bountyId, agent, claim.claimId);
} catch (bountyErr) {
  try { await deps.claimStore.release(claim.claimId); } catch {}
  return fromGroveError(bountyErr);
}
\`\`\`

Same shape as finding 2. \`NexusBountyStore.transitionBounty()\` CAS-writes the bounty document first, then writes the status index (\`nexus-bounty-store.ts:263-275\`). A late failure leaves the bounty already updated to \`claimed\` in storage. Releasing the claim here leaves \`claimId\` and \`claimedBy\` on the bounty pointing at a released claim, which blocks other claimants without giving the system a lease to manage or recover the work.

**Consequence**: a bounty stuck in \`claimed\` with no active claim, invisible to claim-expiry sweeps.

## Design options

The reviewer's recommendations from rounds 1 and 3:

> Do not transition to terminal \`settled\` until capture succeeds, or introduce a durable \`pending_settlement\`/outbox flow with retries and reconciliation.

> Only void on failures proven to be pre-persistence, or make bounty creation atomic / confirm persistence before compensation. A minimal mitigation is to re-read the bounty by ID after a store error and skip \`void()\` if it already exists.

Three paths forward, in rough order of complexity:

### (a) Reconciler + pending_settlement state (correct, most work)

1. Add a \`pending_settlement\` bounty status between \`claimed\` and \`settled\`.
2. \`settleBountyOperation\` transitions \`claimed → pending_settlement\`, then captures, then transitions \`pending_settlement → settled\`.
3. Add a background reconciler that sweeps \`pending_settlement\` bounties and retries \`capture()\` with exponential backoff. Compensating transaction when the reservation is dead.
4. Add post-commit confirmation in the create/claim compensation paths (re-read the bounty record and only compensate if the store state is pre-commit).

Rough size: **200-500 lines + tests + another codex review cycle**. Probably 1-2 days.

### (b) CAS confirmation only, no reconciler

1. Keep the existing state machine.
2. In each compensation catch, re-read the bounty by id (or check the CAS etag) to distinguish pre-commit from post-commit failures.
3. Skip compensation on post-commit failures; surface them with a clear error instead.
4. Leaves the settle-before-pay race unfixed unless settle is reordered back to \"capture first, then persist\".

Rough size: **100-200 lines + tests**. Half a day.

### (c) Revert the compensation refactor entirely (smallest)

1. Drop the try/catch compensation in \`createBountyOperation\`, \`claimBountyOperation\`, \`settleBountyOperation\`.
2. Rely on reservation auto-expiry to reclaim dead escrow and on claim lease timeouts to reclaim dead claims.
3. Keep the cosmetic changes (\`MISSING_BOUNTY_STORE\` const, pre-flight status check on claim).
4. Accept that failed bounty creates leak \"open\" state until the reservation deadline and failed claims leak \"claimed\" state until the lease expires.

Rough size: **50-100 lines**. An hour.

## Recommendation

Start with (c) to stop the bleed, ship it, then do (a) as a proper saga-log project with its own RFC and test harness. (b) is a middle ground but doesn't fix finding 1 (settle-before-pay), which is the critical one.

## References

- Codex review round 1 of PR #239 adversarial-review: findings [critical], [high], [high] on \`src/core/operations/bounty.ts\`.
- Codex review round 3: same three findings re-flagged after the commit excluded bounty.ts.
- PR #239: https://github.com/windoliver/grove/pull/239 — the PR that found these; bounty.ts changes intentionally withheld there.

## Current state of the code

The buggy refactor is sitting uncommitted in the \`ancient-nibbling-anchor\` worktree on files:

- \`src/core/operations/bounty.ts\` — compensation logic + reordered settlement
- \`src/core/operations/bounty.test.ts\` — tests for the new compensation paths

Run \`git stash\` / \`git diff\` on that branch to see the exact 359-line diff.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bounty multi-store atomicity + saga (absorbs #227): compensation bugs, handoff reconciler, settle-before-pay #240

Background

Findings

1. [CRITICAL] `settleBounty` commits terminal state before payment is captured

2. [HIGH] `void(reservationId)` on any `createBounty` error can leave persisted bounty without escrow

3. [HIGH] `release(claim.claimId)` on any `claimBounty` error can strand bounty in `claimed` with no live claim

Design options

(a) Reconciler + pending_settlement state (correct, most work)

(b) CAS confirmation only, no reconciler

(c) Revert the compensation refactor entirely (smallest)

Recommendation

References

Current state of the code

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

bounty multi-store atomicity + saga (absorbs #227): compensation bugs, handoff reconciler, settle-before-pay #240

Description

Background

Findings

1. [CRITICAL] settleBounty commits terminal state before payment is captured

2. [HIGH] void(reservationId) on any createBounty error can leave persisted bounty without escrow

3. [HIGH] release(claim.claimId) on any claimBounty error can strand bounty in claimed with no live claim

Design options

(a) Reconciler + pending_settlement state (correct, most work)

(b) CAS confirmation only, no reconciler

(c) Revert the compensation refactor entirely (smallest)

Recommendation

References

Current state of the code

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

1. [CRITICAL] `settleBounty` commits terminal state before payment is captured

2. [HIGH] `void(reservationId)` on any `createBounty` error can leave persisted bounty without escrow

3. [HIGH] `release(claim.claimId)` on any `claimBounty` error can strand bounty in `claimed` with no live claim