Skip to content

fix(autopilot): cancel the between-cycle sleep on SIGTERM/SIGINT (#204)#208

Open
sharziki wants to merge 1 commit intogarrytan:masterfrom
sharziki:fix/autopilot-cancelable-sleep
Open

fix(autopilot): cancel the between-cycle sleep on SIGTERM/SIGINT (#204)#208
sharziki wants to merge 1 commit intogarrytan:masterfrom
sharziki:fix/autopilot-cancelable-sleep

Conversation

@sharziki
Copy link
Copy Markdown
Contributor

Summary

Fixes #204 — under systemd, gbrain autopilot gets SIGKILL'd before the drain path runs because the between-cycle setTimeout is not cancelable.

Root cause

The autopilot loop waited for the next cycle with:

await new Promise(r => setTimeout(r, interval * 1000));

The SIGTERM/SIGINT handler flips `stopping = true`, but the `while (!stopping)` guard only re-evaluates after the current sleep resolves. With adaptive intervals scaling up to 600s on a healthy brain, systemd's default `TimeoutStopSec=90` loses the race, SIGKILL preempts the shutdown path, and the lockfile at `~/.gbrain/autopilot.lock` is left stale — the next invocation then has to either wait out the 10-minute staleness check or be cleaned up by hand.

Fix

  • Extract `sleepCancelable(ms, signal)` helper — resolves after `ms` or rejects with an `AbortError` the moment the signal aborts.
  • Add a module-scope `AbortController` (`cycleAbort`) in `runAutopilot`.
  • Abort `cycleAbort` from the `shutdown` handler so the loop wakes up immediately.
  • Catch only the abort-path rejection; let any other unexpected rejection propagate so we don't hide real bugs.

No behavior change in the worker-drain path (35s `Promise.race` stays intentional — it bounds how long we wait for the Minions worker to drain).

Test plan

  • New tests in `test/autopilot-sleep-cancelable.test.ts` covering:
    • resolves normally when the signal never aborts
    • rejects synchronously when the signal is already aborted
    • rejects with AbortError when the signal aborts mid-sleep
  • Existing tests still pass:
    • `bun test test/autopilot-sleep-cancelable.test.ts test/autopilot-install.test.ts test/autopilot-resolve-cli.test.ts` → 10/10 pass.

🤖 Generated with Claude Code

The autopilot loop wrapped its between-cycle wait in
`await new Promise(r => setTimeout(r, interval * 1000))`, which is not
cancelable. When SIGTERM/SIGINT fired, the shutdown handler flipped
`stopping = true`, but the `while (!stopping)` guard only re-evaluates
after the current sleep resolves. With adaptive intervals scaling up to
600s on a healthy brain, that means systemd's default TimeoutStopSec=90
loses the race, SIGKILL preempts the drain path, and the autopilot
lockfile at `~/.gbrain/autopilot.lock` is left stale. The next
invocation has to either wait out the 10-minute staleness check or be
cleaned up by hand.

Extract an exported `sleepCancelable(ms, signal)` helper that resolves
after `ms` or rejects with an AbortError as soon as `signal` aborts.
Wire an `AbortController` into the cycle loop, abort it from the
`shutdown` handler, and swallow only the abort-path rejection so the
loop exits on its next `stopping` check.

Added focused tests covering the never-aborted, pre-aborted, and
mid-sleep abort paths.

Fixes garrytan#204
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

autopilot: graceful shutdown waits full cycle interval due to non-cancelable setTimeout

1 participant