fix: file-based kill signal IPC and resumable threads#4
Merged
Conversation
…mable Rework the kill command to use file-based signal IPC instead of directly archiving threads. The running process now polls for a signal file via Promise.race, enabling cross-process kill detection even during slow turn/start requests. Key changes: - Kill signal polling starts before turn/start with two Promise.race stages covering both the request and completion phases - PID file tracking (~/.codex-collab/pids/) enables stale "running" detection in `jobs` when the owning process dies without cleanup - `kill` no longer archives threads — it just interrupts, making threads resumable. `delete` handles permanent cleanup and now also stops running threads before archiving. - Status uses "interrupted" instead of the removed "killed" terminal state, with guard against killing non-running threads - SIGTERM handler added alongside existing SIGINT for graceful shutdown - Thread IDs validated at registration for Windows filename safety - Comprehensive error handling: EPERM-aware process checks, try-catch in setInterval polling, warning logs for all catch blocks - 20 tests covering kill signal races, error propagation, and thread status updates
close() on Windows was returning immediately after taskkill without awaiting proc.exited or readLoop, leaving dangling promises that kept the event loop alive. This caused background tasks to stay "running" long after their output had completed. Now awaits process exit with a 3s timeout on Windows (matching the Unix path). Removes the 1s afterEach delay workaround from protocol tests since close() properly cleans up on all platforms.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5a20ee3c8e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
1. isProcessAlive: return true (assume alive) when PID file is missing instead of false. Prevents cmdJobs from marking live threads as interrupted when PID file write failed or predates the PID tracking feature. 2. Stale signal cleanup: use timestamp comparison instead of unconditional unlink. Only removes signal files older than the current process, preserving fresh kill requests that arrive between process start and poll start.
1. Kill signal poll: keep retrying on transient errors instead of clearing the interval and leaving a never-settling promise that silently disables kill monitoring for the rest of the turn. 2. isProcessAlive: log warning when PID file contains invalid content so corrupted files leave a diagnostic trail. 3. Add boundary test: fresh signal file (current mtime) is preserved at turn start and triggers interrupted status. 4. Fix stale signal test: backdate relative to process.uptime() instead of fixed 60s offset to avoid flakiness when process has been running longer than 60s (CI, watch mode).
When both kill mechanisms fail (signal file write + server interrupt), keep the thread status as "running" so the user can retry. Previously the status was unconditionally set to "interrupted", which blocked subsequent kill attempts via the "already interrupted" guard.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
killwrites a signal file to~/.codex-collab/kill-signals/<threadId>, andexecuteTurnpolls for it viaPromise.race— covering both theturn/startrequest and completion phases. This replaces the previous approach of directly archiving threads on kill.~/.codex-collab/pids/, enablingjobsto detect stale "running" threads when the owning process dies without cleanup (e.g. SIGKILL, crash).killjust interrupts the thread (status becomes "interrupted"), making it resumable via--resume.deletenow handles permanent cleanup and also stops running threads before archiving.close()now awaits process exit on Windows (matching Unix behavior), preventing dangling promises from keeping the event loop alive.Changes
src/turns.tscreateKillSignalAwaiter+ twoPromise.racestages;KillSignalErrorhandling; stale signal cleanupsrc/cli.tscmdKillrewritten (signal-based, no archive);cmdDeletestops running threads;cmdJobsdetects stale processessrc/threads.tssrc/config.tskillSignalsDirandpidsDirconfig getterssrc/types.tssrc/protocol.tsclose()awaits process exit with 3s timeoutsrc/protocol.test.tssrc/turns.test.tssrc/cli.test.tsSKILL.mdTest plan
bun test)