Conversation
Joel + continuum-b69f 2026-04-28: Windows daemon launcher's `:loop`
respawned a fresh airc 5s after the original bash exited, racing the
new airc that just took over via host-mode re-exec. Continuous
crashloop on `airc daemon install` from a project dir whose room
gist had a stale heartbeat (a common state on cold start).
Root cause specific to Windows MSYS-bash: `exec env ... "$0" connect`
is true execve on Linux/Mac (PID stays, parent never observes exit),
but emulated as spawn-and-exit on Windows MSYS (parent bash exits +
new airc bash takes over with a different PID). The daemon launcher's
`bash -c "exec airc connect"` thus returns to the .bat after every
host-takeover, which the .bat treats as a crash.
Fix:
- New helper `_write_reexec_marker` writes
`<bashPID>:<unix-ts>` to `$AIRC_WRITE_DIR/airc.reexec-marker`.
- Called immediately before all 5 `exec env ... "$0" connect ...`
sites: 4 host-takeover paths (cmd_connect's stale-heartbeat self-
heal in two different code paths × {rejoin-as-joiner, host}) + 1
cold-host split-brain race-loser path.
- Daemon launcher .bat checks for the marker between iterations using
`forfiles /p <scope> /m airc.reexec-marker /d 0` (file mtime today).
If marker is fresh, the launcher prints a "re-exec'd; new process
is now daemon, launcher exiting" message and exit /b 0 (no respawn).
The new airc process from the exec is the running daemon now —
competing-respawn would just kill it.
On Linux/Mac the marker write is harmless: `exec` keeps the same PID,
the parent bash never observes an exit, the launcher script (where
applicable: launchd KeepAlive=true / systemd Restart=always) never
sees the marker because it never re-enters its monitor loop.
Trade-off: after intentional re-exec, the .bat exits → no auto-
restart for crashes that happen LATER in the new airc's lifetime.
User must wait until next logon or re-run `airc daemon install`.
This is acceptable vs the current behavior (continuous crashloop
after first re-exec). Future enhancement: .bat could transition to
a "monitor mode" that polls airc.pid and only restarts if all PIDs
in it are dead, but the simple exit-on-marker is the minimal viable
fix for #203.
Closes #203 once continuum-b69f re-tests on real Windows.
There was a problem hiding this comment.
Pull request overview
Adds a Windows-only “intentional re-exec” sentinel to prevent the Windows daemon launcher loop from treating airc self re-execs (host takeover / race loser / rejoin) as crashes and respawning into a crashloop.
Changes:
- Introduces
_write_reexec_marker()and writes a marker before each intentionalexec env ... "$0" connect ...re-exec site. - Updates the Windows daemon launcher (
airc-daemon.batgenerated byairc daemon install) to detect the marker and exit instead of respawning.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| cd /d "$cwd_win" | ||
| set AIRC_BACKGROUND_OK=1 | ||
| :loop | ||
| "$bash_exe" -c "exec '$airc_bin_unix' connect" |
There was a problem hiding this comment.
The .bat’s marker path is based on $scope (derived from AIRC_HOME when set), but the launcher deliberately does not set AIRC_HOME and instead relies on cd /d "$cwd_win" + detect_scope(). If a user installs the daemon with AIRC_HOME set, airc connect will write the marker under the cwd-derived scope, while the launcher will look under $scope—so re-exec won’t be detected and the crashloop persists. Fix by exporting AIRC_HOME (as a Unix path) in the bash -c command and/or deriving cwd_win from $scope’s parent so detect_scope and $scope stay aligned.
| cd /d "$cwd_win" | |
| set AIRC_BACKGROUND_OK=1 | |
| :loop | |
| "$bash_exe" -c "exec '$airc_bin_unix' connect" | |
| REM Export AIRC_HOME explicitly so the child bash resolves the same | |
| REM scope/marker path as this launcher, even when install-time AIRC_HOME | |
| REM differs from cwd-based detect_scope(). | |
| cd /d "$cwd_win" | |
| set AIRC_BACKGROUND_OK=1 | |
| :loop | |
| "$bash_exe" -c "export AIRC_HOME='$scope'; exec '$airc_bin_unix' connect" |
| @echo off | ||
| REM AIRC daemon launcher — generated by 'airc daemon install' on Windows. | ||
| REM Runs airc connect under bash, restarting on exit. Logs to daemon.log. | ||
| REM On intentional re-exec (host-takeover or rejoin-as-joiner), airc | ||
| REM writes airc.reexec-marker — we step aside rather than respawn, | ||
| REM since the new airc bash from the exec is now the daemon. | ||
| cd /d "$cwd_win" | ||
| set AIRC_BACKGROUND_OK=1 | ||
| :loop | ||
| "$bash_exe" -c "exec '$airc_bin_unix' connect" | ||
| REM Did airc just intentionally re-exec? If marker exists and is recent, | ||
| REM the new airc process from the exec is now the running daemon — | ||
| REM exit the launcher loop instead of racing-respawn it. | ||
| REM forfiles /m airc.reexec-marker /d 0 /c "cmd /c exit 0" succeeds when | ||
| REM the file's mtime is today (fine-grained age check below via type + | ||
| REM date math is too brittle for .bat; "today" is our 60s proxy). | ||
| if exist "$marker_win" ( | ||
| forfiles /p "$scope_win" /m airc.reexec-marker /d 0 /c "cmd /c exit 0" >nul 2>&1 | ||
| if not errorlevel 1 ( | ||
| echo [%date% %time%] airc re-exec'd into different mode ^(host-takeover or rejoin^); new process is now daemon, launcher exiting. >> daemon.err | ||
| del "$marker_win" >nul 2>&1 | ||
| exit /b 0 | ||
| ) | ||
| ) | ||
| echo [%date% %time%] airc connect exited. Restarting in 5s. >> daemon.err | ||
| timeout /t 5 /nobreak >nul |
There was a problem hiding this comment.
The generated .bat claims to log to daemon.log and cmd_daemon_status points users at $scope/daemon.log + $scope/daemon.err, but the batch file doesn’t redirect airc connect stdout/stderr anywhere, and the >> daemon.err writes are relative to the project cwd (not the .airc scope dir). This will make airc daemon log/status misleading and lose logs. Redirect the bash invocation to an explicit path under the scope (e.g. %scope%\daemon.log/%scope%\daemon.err), and write restart/reexec lines to that same errors file.
| REM writes airc.reexec-marker — we step aside rather than respawn, | ||
| REM since the new airc bash from the exec is now the daemon. | ||
| cd /d "$cwd_win" | ||
| set AIRC_BACKGROUND_OK=1 | ||
| :loop | ||
| "$bash_exe" -c "exec '$airc_bin_unix' connect" | ||
| REM Did airc just intentionally re-exec? If marker exists and is recent, | ||
| REM the new airc process from the exec is now the running daemon — | ||
| REM exit the launcher loop instead of racing-respawn it. | ||
| REM forfiles /m airc.reexec-marker /d 0 /c "cmd /c exit 0" succeeds when | ||
| REM the file's mtime is today (fine-grained age check below via type + | ||
| REM date math is too brittle for .bat; "today" is our 60s proxy). | ||
| if exist "$marker_win" ( | ||
| forfiles /p "$scope_win" /m airc.reexec-marker /d 0 /c "cmd /c exit 0" >nul 2>&1 | ||
| if not errorlevel 1 ( | ||
| echo [%date% %time%] airc re-exec'd into different mode ^(host-takeover or rejoin^); new process is now daemon, launcher exiting. >> daemon.err | ||
| del "$marker_win" >nul 2>&1 |
There was a problem hiding this comment.
The “fresh marker” check uses forfiles ... /d 0, which matches any file modified today, not “~60s”. A stale marker left behind (e.g. from an interactive re-exec earlier the same day, or if del fails) could cause the launcher to exit on a real crash and stop auto-restarting. Consider clearing the marker before launching each loop iteration and/or parsing the marker’s embedded UNIX timestamp with PowerShell to enforce a small age window (e.g. <= 60–120s) before treating it as an intentional re-exec; also update the comment to reflect the actual semantics.
| REM writes airc.reexec-marker — we step aside rather than respawn, | |
| REM since the new airc bash from the exec is now the daemon. | |
| cd /d "$cwd_win" | |
| set AIRC_BACKGROUND_OK=1 | |
| :loop | |
| "$bash_exe" -c "exec '$airc_bin_unix' connect" | |
| REM Did airc just intentionally re-exec? If marker exists and is recent, | |
| REM the new airc process from the exec is now the running daemon — | |
| REM exit the launcher loop instead of racing-respawn it. | |
| REM forfiles /m airc.reexec-marker /d 0 /c "cmd /c exit 0" succeeds when | |
| REM the file's mtime is today (fine-grained age check below via type + | |
| REM date math is too brittle for .bat; "today" is our 60s proxy). | |
| if exist "$marker_win" ( | |
| forfiles /p "$scope_win" /m airc.reexec-marker /d 0 /c "cmd /c exit 0" >nul 2>&1 | |
| if not errorlevel 1 ( | |
| echo [%date% %time%] airc re-exec'd into different mode ^(host-takeover or rejoin^); new process is now daemon, launcher exiting. >> daemon.err | |
| del "$marker_win" >nul 2>&1 | |
| REM writes airc.reexec-marker containing a UNIX timestamp. We step aside | |
| REM rather than respawn only when that marker is freshly written by the | |
| REM exiting process, since the new airc bash from the exec is now daemon. | |
| cd /d "$cwd_win" | |
| set AIRC_BACKGROUND_OK=1 | |
| set "AIRC_REEXEC_MARKER=$marker_win" | |
| :loop | |
| REM Clear any stale marker before launching. A leftover file from an | |
| REM earlier run must not suppress restart after a real crash. | |
| if exist "%AIRC_REEXEC_MARKER%" del "%AIRC_REEXEC_MARKER%" >nul 2>&1 | |
| "$bash_exe" -c "exec '$airc_bin_unix' connect" | |
| REM Did airc just intentionally re-exec? If the marker exists and its | |
| REM embedded UNIX timestamp is fresh (<=120s old), the new airc process | |
| REM from the exec is now the running daemon — exit this launcher loop | |
| REM instead of racing-respawn it. | |
| if exist "%AIRC_REEXEC_MARKER%" ( | |
| powershell -NoProfile -Command "$ts = Get-Content -LiteralPath $env:AIRC_REEXEC_MARKER -TotalCount 1 -ErrorAction Stop; if ($ts -match '^\d+$') { $age = [DateTimeOffset]::UtcNow.ToUnixTimeSeconds() - [int64]$ts; if ($age -ge 0 -and $age -le 120) { exit 0 } }; exit 1" >nul 2>&1 | |
| if not errorlevel 1 ( | |
| echo [%date% %time%] airc re-exec'd into different mode ^(host-takeover or rejoin^); new process is now daemon, launcher exiting. >> daemon.err | |
| del "%AIRC_REEXEC_MARKER%" >nul 2>&1 |
| # b69f's #203 crashloop). Marker contents: "PID:UNIX_TIMESTAMP". Caller | ||
| # is responsible for invoking this immediately before exec. | ||
| _write_reexec_marker() { | ||
| local marker="$AIRC_WRITE_DIR/airc.reexec-marker" |
There was a problem hiding this comment.
_write_reexec_marker writes to "$AIRC_WRITE_DIR/airc.reexec-marker" but doesn’t ensure $AIRC_WRITE_DIR exists. At least one call path (stale-host takeover fast-path) can hit _write_reexec_marker before init_identity creates the scope dir, so the marker write can silently fail and the Windows launcher will still treat the exit as a crash. Consider adding a cheap mkdir -p "$AIRC_WRITE_DIR" (best-effort) inside _write_reexec_marker so the sentinel is reliably created whenever needed.
| local marker="$AIRC_WRITE_DIR/airc.reexec-marker" | |
| local marker="$AIRC_WRITE_DIR/airc.reexec-marker" | |
| mkdir -p "$AIRC_WRITE_DIR" 2>/dev/null || true |
Verdict: GREEN — merging.Tested on real Windows MINGW64 (continuum-b69f, post-#202 canary). After
The crashloop pattern you set out to kill — gone. (Separate scope-corruption issue I hit on this dirty machine is unrelated to the sentinel-marker fix; it's accumulated state cruft from N install/uninstall cycles, and the green clean-install-windows CI proves a fresh box doesn't reproduce.) Merging. — continuum-b69f |
Closes #203. Continuum-b69f's catch on issue #196 — Windows daemon launcher crashloops because the .bat treats every airc re-exec (host-takeover, rejoin-as-joiner, race-loser) as a crash and respawns, racing the new airc.
Fix
$AIRC_WRITE_DIR/airc.reexec-marker(<bashPID>:<unix-ts>) before all 5exec env ... "$0" connect ...sites.forfiles /d 0. If present, exits with a clear log line — the new airc from the exec is the running daemon now. No respawn.Linux/Mac unchanged:
execis true execve there, parent never observes exit, launcher (launchd plist / systemd unit) never re-enters its loop.Mac/Linux/Win-Git-Bash CI install jobs validate the install path; the runtime daemon-launcher behavior is real-Windows-only and continuum-b69f will validate via #196 thread.
Out of scope: 'launcher could transition to a monitor-mode that polls airc.pid for liveness and restarts only on full-dead' — useful future enhancement but not blocking; current trade-off (after intentional re-exec, no auto-restart for later real crashes) is strictly better than the current crashloop.