Skip to content

release: canary → main (31 commits) — Python truth-layer + cross-platform install + 13 bug fixes#191

Open
joelteply wants to merge 40 commits intomainfrom
canary
Open

release: canary → main (31 commits) — Python truth-layer + cross-platform install + 13 bug fixes#191
joelteply wants to merge 40 commits intomainfrom
canary

Conversation

@joelteply
Copy link
Copy Markdown
Contributor

What's in this bundle

31 commits since main, accumulated through canary cross-validation by the AI peer team (vhsm-d1f4, ideem-local-4bef, continuum-b69f, authenticator-fd63). 13 issues closed in the most recent session alone (#179, #132, #91, #163, #161, #145, #94, #97, #98, #99, #184, #142, plus the issue that filed #188).

Architecture: Python truth-layer (#152 Phase 0+1, PRs #166-#175)

`airc` (bash) is the user surface; business logic moves to `lib/airc_core/*` Python modules called via argparse CLIs:

  • `airc_core.config` — config.json CRUD (`get`, `get_name`, `set_name`, `set_host_block`)
  • `airc_core.datetime` — iso_to_epoch (BSD/GNU/python fallback unified)
  • `airc_core.handshake` — pair handshake (joiner `send`, host `accept_one`, response `get_field`)
  • `airc_core.monitor_formatter` — 250-line message stream formatter

Plus bash decomposition (`lib/airc_bash/*`) — `platform_adapters.sh`, `cmd_doctor.sh` extracted from the monolith. `airc` itself is now under 5000 lines (was ~5500).

All Python CLIs use argparse `--flags` for paths (per #174), not env vars — so MSYS path-translation on Git Bash is per-arg-predictable. `airc_core` carries the canonical types; bash callers are thin dispatchers.

Cross-machine substrate fixes (PRs #160, #164, #176-#178)

Cross-Mac/Windows airc messaging now verified end-to-end:

Windows install path fixes (PR #187closes #94, #97, #98, #99)

A Windows user can now do a clean install end-to-end:

Stability fixes

CI infrastructure (PRs #186, #187)

`.github/workflows/ci.yml` — five jobs on every PR + push:

  • clean-install-linux: ubuntu install.sh + airc doctor + smoke
  • clean-install-macos: macos install.sh + airc doctor + smoke
  • clean-install-windows: windows install.ps1 (pwsh) + airc doctor
  • clean-install-windows-ps5: install.ps1 under Windows PowerShell 5.1 (the default Windows shell — guards bootstrap-airc.ps1 fails under PowerShell 5.1 (airc.ps1 requires 7+) #91)
  • integration-suite: full `test/integration.sh` on push to canary/main

`AIRC_SKIP_PREREQS` is NOT used — the CI tests the real install path on stock runners. `CI=true` auto-detection in install.sh skips sshd setup (osascript admin prompt hangs in CI) + tailscale install (slow, optional, no tailnet).

Smaller fixes

Risk profile

  • Python is now a hard prereq (`airc_core` modules invoked at startup). Already documented; doctor + install.sh + install.ps1 all install it.
  • The bash file got smaller (5500 → ~4900 lines via airc_core extraction); behavior should be identical.
  • All 31 commits passed canary cross-validation by the AI peer team running them daily. Several were caught + fixed mid-stream (e.g. continuum-b69f's MSYS path-mangling diagnosis closed three commits later).

Test plan

🤖 Generated with Claude Code

joelteply and others added 30 commits April 27, 2026 12:32
…iteration step 1) (#151)

refactor(adapters): iso_to_epoch dedupes BSD/GNU date split (3 callsites)

Pre-fix: the BSD-vs-GNU 'date' fork had its own \`-j -u -f ... ||
date -u -d ...\` fallback chain at three callsites (heartbeat parse
in cmd_connect, _format_relative_time, _is_stale). Each chain had
slightly different error handling — heartbeat returned empty on
parse-fail and skipped the staleness check; _format_relative_time
echoed the raw ts; _is_stale returned 1. Three places, three slight
variations of "the same idea." Future fixes (e.g. WSL date drift,
Cygwin coreutils gaps) had to land at every site.

Post-fix: single iso_to_epoch helper near the platform adapters block.
Tries BSD → GNU → python3 datetime fallback. All three callsites
route through it. Each callsite kept its OWN error handling (their
semantics differ, that's fine — the parse layer is what was duplicated).

Adds python3 fallback that didn't exist anywhere before — useful on
minimal MSYS/Cygwin where neither date flavor parses. Unit-tested in
scenario_platform_adapters with a known timestamp + empty + garbage
inputs.

Joel's directive 2026-04-27: "look for ways to keep these consistent,
permanently." This is one pattern; the deeper bash↔PowerShell drift
question is a separate architectural conversation (Python truth-layer
candidate). Filing the architectural piece separately so the immediate
adapter dedupe can ship without blocking on the bigger discussion.

Test posture:
- platform_adapters: 11/11 (was 8/8; +3 for iso_to_epoch)
- list / rooms / ls: 4/4 (downstream consumer via _format_relative_time)
- part_persists, part_keeps_sidecar: 8/8 + 6/6 (heartbeat path,
  unchanged behavior)
…and -v (#153)

Bug found by continuum-b69f via cross-Mac/Windows substrate-bypass
gist on 2026-04-27. Symptoms on Windows Git Bash: airc connect failed
with "Can't reach 100.91.51.87:7547. Is the host running 'airc connect'?"
even though Test-NetConnection succeeded on the port and a manual
python socket connect to the same address completed the handshake.

## Root cause

Modern Windows ships %LOCALAPPDATA%\Microsoft\WindowsApps\python3.exe —
a Store-installer shim. The file exists, satisfies `command -v python3`,
but invocation exits 49 with stderr "Python was not found; run without
arguments to install from the Microsoft Store..." It is NOT a real
interpreter.

airc top-level (lines 17-31 pre-fix) gated python3 detection on
`command -v` alone. The Store stub fooled the gate, so the python ->
python3 shim NEVER installed. Every later `python3 -c "..."` inside
the script — including the pair handshake at line 2495 — silently
hit the Store stub, exited 49, and bash captured _pair_ok=0. The
script then printed the misleading "Can't reach" message and discarded
the captured stderr (the SECOND bug — see below).

## Fix

1. **airc top-level**: probe with `python3 --version >/dev/null 2>&1`,
   not bare `command -v`. Store stub fails fast → fallback to real
   `python` (also strict-probed) → if neither works, ERROR with a
   Windows-specific hint pointing at App execution aliases.

2. **die "Can't reach"**: print the captured handshake `$response`
   (stderr+stdout from 2>&1) before the die. Per the global "never
   swallow errors" rule — evidence is for the debugger, not the
   trash. Pre-fix, the actual Store-stub error was invisible to
   anyone trying to diagnose.

3. **_doctor_probe**: same strict --version probe. Distinguishes
   [BROKEN] (on PATH but stub) from [MISSING] (absent) so the fix
   hint matches the actual condition. Pre-fix `airc doctor` reported
   "[ok] python3" against the stub.

4. **install.sh prereq scan**: same strict probe in the installer's
   missing-prereq loop. Pre-fix, install.sh printed "All required
   prereqs present" against a stub-only Windows install, then airc
   immediately silent-fail-cascaded on first run.

## Why airc didn't catch this earlier

Windows + Microsoft Store python3 alias is the default since ~Windows
10 1903. The stub is invisible to existence-only probes. Anyone who
installs Python from python.org but doesn't disable the App execution
aliases (the default state) hits this. Joel hit it after rebooting
his Windows install today; continuum-b69f isolated it within ~5 min
on the substrate-bypass gist.

## Test posture

Manual: simulated Store stub locally with `exit 49` script on PATH:
- Stub-only:       ERROR with Windows-specific hint ✓
- Stub + real py:  fallback shim activates, airc runs ✓
Mac integration: identity 19/19, whois 5/5, quit 9/9, away 5/5,
                 list 4/4, part_persists running.

## Out of scope

The deeper bash↔PowerShell drift problem (#152) remains. This PR
fixes ONE symptom of that drift surfacing in production. Per Joel
2026-04-27: "make it work first then find patterns" — shipping the
work-now fix; architectural unification is its own conversation.
…ral gists (#154)

fix(sidecar): inherit --no-gist flag from primary so test fixtures stop leaking #general gists

Bug found by continuum-b69f via cross-Mac/Windows substrate-bypass
gist 2026-04-27. After the python3 detection fix landed on Windows
(PR #153), continuum's airc connect resolved a #general gist that
pointed at port 7556 — a Mac-side TEST FIXTURE corpse. Pre-fix:
spawn_general_sidecar_if_wanted at airc:1159 spawned the sidecar
with `--room general` only, ignoring the parent cmd_connect's
`--no-gist` flag. Test scenarios (scenario_part_persists,
scenario_general_sidecar_default, scenario_part_keeps_sidecar)
spawn the primary with --no-gist --no-discovery to stay isolated,
but the sidecar then went and PUBLISHED a real `airc room: general`
gist on the live joelteply gh namespace. cleanup_all's `kill -9`
bypasses the on-exit gist-delete trap, so the gist orphans forever.

Real users discovering #general via auto-scope hit the orphan first
(usually most-recent), try TCP to a port whose process exited 30
minutes ago, get RST, end up confused.

## Fix

If `use_gist=0` (set by --no-gist on the primary), pass --no-gist
to the sidecar spawn too. The flag inherits via the new
`_sidecar_args` array. AIRC_NO_DISCOVERY=1 already inherits via
subshell environment; only the flag needed explicit forwarding.

## Why integration tests didn't catch this

The leakage happens on the live gh account. Integration tests run
as Joel on his own gh account, so the leaked gists pollute his
own substrate — invisible to test assertions, very visible to
real users on the same gh account. Cross-account QA caught it
(continuum-b69f's Windows tab discovered the orphan that Mac
tests had created an hour earlier).

## Aftermath

Already manually deleted 6 orphan gists post-cleanup (alpha #general
+ 5x cakr-test-*). With this fix, future test runs stop creating
new ones. The trap-bypassed-by-kill-9 issue is a separate bug
(test fixtures should kill politely).

## Test posture

- part_keeps_sidecar: 6/6
- part_persists: 8/8
- general_sidecar_default: 12/12
…s limit will kill people) (#155)

fix(gist): git-clone fallback + |\| true guards so rate-limit doesn't kill resolution

Bug found by continuum-b69f mid-cross-machine bring-up 2026-04-27:
gh's gist sub-bucket throttled at ~60 reads/hr; a busy session
exhausts it; every subsequent `gh api gists/<id>` AND `gh gist view`
returns HTTP 403; airc's gist-resolution chain failed silently;
discovery hung at "Resolving gist...". Joel: "this limit will kill
people."

## Two bugs in one

### 1. set -e + pipefail aborts script on rate-limit
The existing chain:
```bash
raw_content=$(gh api "gists/$gist_id" 2>/dev/null \
              | jq -r '.files | to_entries[0].value.content // empty' 2>/dev/null)
```
With `set -euo pipefail` at airc:9, when `gh api` returns 403:
- pipefail propagates the non-zero from gh up the pipeline
- the `$(...)` capture inherits the non-zero
- set -e aborts the script before reaching the next fallback

Net: rate-limit hit = entire script dies with exit 5, no diagnostic,
no fallback attempted. Fix: each path wrapped with `|| true` so a
non-zero exit becomes empty `$raw_content` and the `[ -z ]` gate
flows through to the next fallback.

### 2. All existing fallbacks use the same throttled REST bucket
Even with the abort fixed, paths A (gh api+jq) / B (gh view --raw) /
C (curl + jq) all hit gist sub-bucket which is the EXACT thing
that's exhausted. New fallback: git clone the gist's git remote.
Git transport is on a separate quota — keeps working when REST is
throttled. Adds ~1s on the slow path, unblocks discovery completely.

## New chain (insertion-ordered fallthrough)

1. gh api + jq          (REST, fast — primary path)
2. gh gist view --raw   (REST, fallback)
3. **git clone gist remote** (NEW — bypasses REST sub-bucket)
4. curl + jq            (REST, anonymous last resort)

If you have git, you survive rate-limit. The git-clone path was
verified live: while gh api returned 403 in <0.3s, git clone of the
same gist returned the JSON envelope cleanly in ~0.3s.

## Test posture (Mac, regression check)

- part_persists: 8/8
- list: 4/4
- general_sidecar_default: 12/12

The actual rate-limit-recovery path was verified by `bash -x` trace
under live throttle: `+ raw_content='{` shows git-clone populating
raw_content after both gh paths returned empty.

## Out of scope (filed sep)

airc.ps1 has the same gist-resolution chain pattern (REST-only).
Same fix applies — Windows iteration step 2 in the canary backlog.
…thon (PR #153 follow-up) (#156)

feat(doctor,install): probe sshd readiness so hosting works on Windows + scope ssh-stub probe to python only

Joel's directive 2026-04-27: "Both need to host so just part of doctor
and/or install" — Windows users need sshd to host airc rooms, but
Windows ships OpenSSH client only (server is opt-in capability since
Win10 1809). Pre-fix: install printed "All required prereqs present"
against a Windows install with no sshd; airc doctor probed for ssh
client only. First cross-machine pair silently failed at the
ssh-tail step.

## Changes

### `airc doctor` — new `_doctor_probe_sshd` per-platform

- **macOS**: launchctl + `systemsetup -getremotelogin` for the Remote
  Login state. Fix hint: System Settings -> Sharing -> Remote Login.
- **Linux/WSL**: `systemctl is-active` on `ssh` (Debian/Ubuntu unit
  name) and `sshd` (RHEL/Fedora). Fix hints for both pkgmgrs.
- **Windows-bash**: `powershell.exe -Command "(Get-Service sshd
  -ErrorAction SilentlyContinue).Status"` distinguishes:
    Running → ok
    Stopped/StopPending/StartPending/Paused → BROKEN with start hint
    empty → MISSING with Add-WindowsCapability hint
- **Other**: info-level skip; doesn't penalize.

### `install.sh` — same probe at install time

Same per-platform branches; warn-only (no auto-install since
elevation needed on Windows). User runs the printed PowerShell
commands once, re-runs installer, sshd is up.

### `_doctor_probe` — scope strict-probe to python only (BUG REGRESSED FROM PR #153)

The PR #153 strict-probe applied `--version` to ALL binaries. macOS BSD
ssh-keygen exits 1 on `--version` ("illegal option"), so doctor false-
positived [BROKEN] on every Mac. The new sshd probe surfaced this
regression on its first run (clean Mac doctor output revealed the
stale [BROKEN] ssh-keygen line).

Fix: only python and python3 have shadow-aliases on Windows
(Microsoft Store stubs). Other binaries are uniquely shipped by the
user's package manager — bare `command -v` is correct + portable.

## Why this matters

"Both need to host" — the airc design assumes every peer is a
first-class host candidate. Pre-fix Windows users discovered they
COULDN'T host until they hit it the hard way (peers can't connect,
no diagnostic). Post-fix, install + doctor surface it immediately
with the exact admin-PowerShell commands.

## Test posture (Mac regression)

- part_persists: 8/8
- list: 4/4
- general_sidecar_default: 12/12
- platform_adapters: 11/11
- airc doctor live: 7/7 prereqs ok, 1 sshd MISSING (this Mac has
  Remote Login off — correctly flagged with the macOS-specific fix).

## Out of scope

`airc.ps1` should also gain an equivalent probe + install.ps1 should
auto-install + start sshd when run elevated. Queued for Windows
iteration step 3.
…needs to be in the install" gap) (#157)

feat(install): auto-install + start sshd during install (close architectural gap)

Joel's directive 2026-04-27 (via continuum-b69f relay through coord
gist):

> "if we can prompt the user, we do NOT have them do annoying setup
> shit we automate into install, which gets what it needs done, no
> later interaction and definitely not MORE after first install. and
> detect via doctor if missing. and tell them how to remedy."

Translation:
1. install.{sh,ps1} does end-to-end setup with elevation prompts (ONE
   elevation moment during first install). No separate post-install
   steps for the user to remember.
2. airc doctor is drift detection — catches when something flipped off
   after install. Already done in PR #156.
3. Remedy commands are AI-runnable — doctor's output is a contract
   with the user's AI. Already done in PR #156.

Missing piece (this PR): install.{sh,ps1} should actually RUN the
missing prereq commands during install, not just probe + report.

## Changes

### install.sh — `_ensure_sshd_running`

Per-platform, idempotent (no-op if already running):

- **macOS**: probes Remote Login state (launchctl/systemsetup); if off,
  runs `sudo systemsetup -setremotelogin on` with one sudo prompt.
- **Linux**: probes systemctl (Debian's ssh and RHEL's sshd unit
  names); if missing, installs openssh-server via the platform's
  package manager + enables-and-starts the right unit.
- **Windows-bash**: probes via `powershell.exe Get-Service sshd`;
  if missing or stopped, self-elevates via
  `powershell.exe Start-Process -Verb RunAs` with all three commands
  inline (Add-WindowsCapability + Start-Service + Set-Service
  Automatic) → ONE UAC prompt for the user.

\`AIRC_SKIP_SSHD=1\` short-circuits for headless CI / config-managed
environments.

### install.ps1 — `Install-OpenSSHServer`

Mirrors the bash logic for the native Windows installer. Probes
Get-Service sshd, then Get-WindowsCapability for state. Three commands:
Add-WindowsCapability, Start-Service, Set-Service Automatic. Catches
admin-required errors and prints the manual fallback (same shape as
existing Install-OpenSSHClient).

Hooked into the install flow right after Install-OpenSSHClient.

## Idempotency

Both install.sh and install.ps1 short-circuit if sshd is already
Running. Re-running install.sh on a working box doesn't re-prompt
for sudo or UAC. Same for install.ps1.

## Test posture (Mac regression)

- part_persists: 8/8
- list: 4/4
- general_sidecar_default: 12/12
- platform_adapters: 11/11

## Out of scope

End-to-end Mac↔Windows substrate test once both sides have sshd up
(parallel work; not blocked on this PR).
…g (PR #156/#157 live-test followups) (#158)

fix(sshd-probe): macOS detection without sudo + osascript admin dialog when non-interactive

Two issues found while running PR #157 live on Mac 2026-04-27:

## Bug 1: launchctl list (user scope) doesn't show system services

Pre-fix probe:
```bash
launchctl list 2>/dev/null | grep -q "com\.openssh\.sshd"
```
Bare `launchctl list` is user-scope. Returns user-launched LaunchAgents
only — never system-level launchd jobs like com.openssh.sshd. The
fallback `systemsetup -getremotelogin` requires sudo to read state.

Net: doctor reported `[MISSING] sshd` even when Remote Login was
fully enabled and active sshd-session processes were forking.

Fix: `launchctl print system` (no sudo needed) lists system services
including com.openssh.sshd when Remote Login is on. Anchor regex on
service-id boundary so we don't false-positive on per-connection
session subkeys (com.openssh.sshd.<UUID>) which exist transiently
even when Remote Login is just toggling.

## Bug 2: install.sh sudo path fails in non-interactive contexts

When install.sh runs from a Monitor-spawned shell or curl|bash pipe,
no TTY is attached. `sudo` then says "a terminal is required to read
the password; either use the -S option to read from standard input or
configure an askpass helper." Same problem as Joel hit running this
from his Claude Code Bash tool.

Fix: detect TTY presence (\`[ -t 0 ] && [ -t 1 ]\`); if interactive,
use sudo. If not, fall through to osascript with the native macOS
admin GUI dialog (with a branded prompt explaining what airc is
doing — Joel 2026-04-27 relay through continuum-b69f).

## Live verification

Pre-fix doctor on this Mac (Remote Login enabled live via osascript):
```
[MISSING] sshd -- needed when you HOST a room
```

Post-fix:
```
[ok] sshd (Remote Login enabled)
```

## Same probe in install.sh

The Darwin branch of \`_ensure_sshd_running\` now:
- detects via launchctl print system (matching doctor)
- splits sudo (TTY) vs osascript (non-interactive) for the elevation
- both paths print airc-branded explanation in the admin prompt
…stall) (#159)

fix(doctor): tailscale probe uses resolve_tailscale_bin (catches macOS GUI install)

Bare `command -v tailscale` false-negatives on every macOS install
that came from the App Store or downloaded .dmg — Tailscale.app's
binary lives at /Applications/Tailscale.app/Contents/MacOS/Tailscale,
not on PATH. Caught live 2026-04-27 when airc doctor reported
"tailscale not installed" on this Mac while airc was actively
publishing a Tailscale IP (100.91.51.87) in the room gist envelope.

resolve_tailscale_bin() already exists (called by host_address_set,
tailscale_login_check_or_prompt, etc.) — handles the GUI bundle
path AND windows tailscale.exe AND Linux PATH. Doctor probe just
needs to use it instead of `command -v`.

Live verify on this Mac:
- pre-fix: `[info] tailscale (optional) -- not installed`
- post-fix: `[ok] tailscale (optional) -- daemon up`
…gnosis — sshd bind EPERM) (#160)

fix(install,doctor): Windows HNS port-22 reservation + firewall rule (continuum-b69f diagnosis)

Bug found by continuum-b69f mid-Windows-bringup 2026-04-27:
\`Start-Service sshd\` failed with "Cannot bind any address" / permission
denied even with admin. Root cause: Windows HNS (Host Network Service —
backs Hyper-V, WSL2, Docker Desktop) dynamically reserves port ranges
at boot. The reservations rotate per-boot and are NOT visible in
\`netsh int ipv4 show excludedportrange\` (which only shows static admin
reservations). When port 22 randomly falls inside an HNS-held range,
sshd's bind() returns EPERM at OS level, regardless of admin status.

Sources:
- https://keasigmadelta.com/blog/how-to-solve-cannot-bind-to-port-due-to-permission-denied-on-windows/
- docker/for-win#3171
- https://gist.github.com/strayge/481a77d31a94e133a76662877b1a90ca

## Persistent fix (this PR)

Two-step persistent workaround applied during admin-elevated sshd
install. Both ops idempotent — re-run of install on a healthy box
doesn't re-prompt or duplicate state.

1. \`reg add HKLM\\SYSTEM\\CurrentControlSet\\Services\\hns\\State
    /v EnableExcludedPortRange /d 0 /f\`
   Disables HNS auto-exclusion. Survives reboots.
2. \`netsh int ipv4 add excludedportrange protocol=tcp startport=22
    numberofports=1\`
   Explicitly reserves port 22 in the static excluded-port-range so
   HNS can't grab it on subsequent boots.

Plus a New-NetFirewallRule for the OpenSSH-Server-In-TCP rule (the
capability install usually creates it but it can be missing/disabled
on some systems — idempotent check before creating).

## Files changed

- \`install.ps1\` — \`Set-HnsPortFreedomFor22\` helper + wired into
  \`Install-OpenSSHServer\`. Native Windows installer path.
- \`install.sh\` — Windows-bash branch's \`_ensure_sshd_running\` now
  emits a single elevated PowerShell payload that runs ALL the steps
  (capability install + HNS workaround + firewall rule + start +
  persist) so Joel/users click UAC ONCE for the whole sshd setup.
- \`airc doctor\` — \`[MISSING] sshd\` Windows hint now includes the
  reg+netsh lines and explains why (HNS quirk). User can run all five
  commands as a contiguous block to remediate manually.

## Why this matters

Pre-fix, even after the user ran the Add-WindowsCapability + Start-
Service incantation from PR #156's hint or PR #157's auto-install,
they could STILL hit the bind-EPERM if HNS happened to claim port 22
on their boot. Random failure, no diagnostic, looks like a permission
bug. Continuum-b69f's diagnosis turns this from an unsolvable random
into a one-time install action.

## Test posture (Mac regression)

Mac side unchanged behavior; HNS branch only fires on MINGW/MSYS/CYGWIN.

- part_persists: 8/8
- list: 4/4
- general_sidecar_default: 12/12
- platform_adapters: 11/11

## Out of scope

Cross-machine substrate end-to-end test once continuum's Windows host
binds port 22 successfully. Parallel work; not blocked on this PR.
…velopes without it (continuum's diagnosis) (#162)

fix(parser,prereq): jq is required, not optional — fallback parser corrupts gist envelopes without it

Bug found by continuum-b69f Win→Mac e2e 2026-04-27 (forensics in
cross-Mac/Windows coord gist):

continuum's airc connect from Windows Git Bash succeeded with
"Connected to '\"invite\":\"authenticator-fd63'" — JSON envelope syntax
leaked into the displayed peer name. Worse:
- room_name file never written to disk
- subsequent airc msg stored locally with from:"unknown"
- broadcast never landed in mac host's messages.jsonl

Two bugs from one root cause: **jq missing on Windows Git Bash.**

## Root cause

cmd_connect's gist resolver has two paths:
1. JSON envelope parse via jq — sets `resolved` (invite string) AND
   `resolved_room_name` from `.name` field.
2. Legacy raw-string fallback — bare grep for the first `@.*@` line.

When jq is absent on PATH (the default state on Git Bash), path 1
short-circuits silently. Path 2 grabs the whole quoted JSON line
including the `"invite":"` key prefix. The downstream @-split
(which extracts name@user@host:port) then captures the JSON-key
fragment as the peer name.

Worse: `resolved_room_name` is ONLY set inside path 1's room-case
branch. Path 2 leaves it empty. Hence the `if [ -n "$resolved_room_name" ]; then echo ... > room_name`
write at line 2495 never fires. Joiner connects "successfully" but
doesn't know what room they're in. Subsequent msg sends queue/ship
without room context; host filters them out.

## Fix (three layers)

### Layer 1: jq is now a required prereq (install.sh + install.ps1 + airc doctor)

- install.sh: added `jq` to the prereq install loop. pkgname_for
  maps `jq` → `jqlang.jq` on winget, bare `jq` on
  brew/apt/dnf/pacman/apk.
- install.ps1: new `Install-IfMissing -Name 'jq' -WingetId 'jqlang.jq'`
  line.
- airc doctor: new probe `_doctor_probe "jq" "Gist envelope parser
  (rooms, addresses)"` flags missing jq with the same install hint
  shape as other prereqs.

### Layer 2: legacy fallback now strips JSON-key prefix

The grep-based fallback can still be reached on minimal environments
that genuinely don't have jq (busybox+nothing, weird CI). Pre-fix it
captured `"invite":"authenticator-fd63@...` verbatim. Post-fix:
`sed -E 's/^[^a-zA-Z]+//'` strips leading non-letter characters before
the @-split runs. JSON quotes, key syntax, leading whitespace all
stripped uniformly.

### Layer 3: legacy fallback now extracts room name

When jq is missing, the fallback also walks the raw_content for
`"name": "..."` and captures the value into `resolved_room_name`.
Same JSON envelope shape as the jq path; sed-only so it works
without any JSON parser. Empty for legacy gists (no envelope) —
matches pre-existing behavior on those.

## Why three layers

Layer 1 (jq required) is the canonical fix — every install going
forward has jq, the JSON path always works. Layers 2+3 are defense
in depth: any environment that escapes layer 1 (older airc installs,
manual installs, distros where jq install fails) won't silently
corrupt — fallback now produces a correct peer name AND the right
room_name file.

## Test posture

Mac doctor with PR live: all probes [ok] including new jq.

```
[ok] git
[ok] gh
[ok] gh authenticated
[ok] openssl
[ok] ssh
[ok] ssh-keygen
[ok] python3
[ok] jq
[ok] sshd (Remote Login enabled)
[ok] tailscale (optional) -- daemon up
```

Mac regression:
- part_persists: 8/8
- list: 4/4
- general_sidecar_default: 12/12
- platform_adapters: 11/11

## Out of scope

continuum-b69f's UTF-8 → Latin-1 double-decode on `→` is a separate
encoding bug in the bash → python3 → jq pipeline. File for
follow-up; this PR is JSON-key-leak + jq-as-prereq.
…root cause) (#164)

fix(python): AIRC_PYTHON env var replaces broken export -f shim (THE root cause continuum found)

continuum-b69f's traced send 2026-04-27 found THE bug behind every
silent-broadcast-failure on Windows Git Bash. Long-form analysis in
the cross-Mac/Windows coord gist; tldr below.

## Bug

PR #153 added a bash function shim:

```bash
if ! python3 --version >/dev/null 2>&1; then
  if command -v python >/dev/null 2>&1; then
    python3() { command python "$@"; }
    export -f python3 2>/dev/null || true
  fi
fi
```

`export -f python3` is supposed to propagate the function into
subshells. On Git Bash MINGW, `export -f` succeeds silently but the
function does NOT reliably inherit into `$(...)` command-substitution
subshells. Result: every callsite that captures `$(python3 -c "...")`
output (45+ in airc) bypassed the shim, hit the Microsoft Store
stub, exited ~49 with empty stdout. The `|| echo ""` fallbacks on
those sites then silently set config values to empty strings.

Cascade:
- `get_name` → `from:"unknown"` in stored messages
- `get_config_val host_target ""` → empty → cmd_send takes HOST path
  (no `[ -n "$host_target" ]`), mirrors locally only, NEVER SSH-pushes
- `get_config_val host_airc_home ""` → empty → would-be wrong path
  anyway (but moot since SSH was skipped)

Net: continuum's Windows airc msg returned exit 0, mirrored locally,
broadcast NEVER reached the mac host's messages.jsonl. cmd_send's
"queue or die" failure paths never fired because cmd_send thought
it WAS the host. Every Win→Mac broadcast invisible-failed.

## Fix (continuum's prescription)

Replace the function-shim with a bash variable. Bash variables
propagate to subshells unconditionally — no function-export quirks.

```bash
if python3 --version >/dev/null 2>&1; then
  AIRC_PYTHON=python3
elif command -v python >/dev/null 2>&1 && python --version >/dev/null 2>&1; then
  AIRC_PYTHON=python
else
  echo "ERROR: airc requires a working python3..." >&2
  exit 1
fi
export AIRC_PYTHON
```

Then sed across airc: every `python3 -c "..."` callsite (45 of them)
becomes `"$AIRC_PYTHON" -c "..."`. The two `command -v python3`
guards (which became unreliable under the Store-stub case) become
`[ -n "${AIRC_PYTHON:-}" ]` — set if and only if a working python
resolved at startup.

## Why this matters beyond Win→Mac

The same `export -f` leak silently corrupted every config read on
Windows Git Bash. Every `airc nick` rendered nicks blank; every
`airc whois` walked an empty peer file path; every `cmd_send` was
mirroring-locally-only. Three full days of "Windows works" reports
were actually "Windows mostly works for read-only commands; sends
silent-fail." This fix unblocks the whole Windows code path.

## Test posture (Mac regression — function-shim never fired here)

- identity: 19/19
- whois: 5/5
- part_persists: 8/8
- list: 4/4
- general_sidecar_default: 12/12
- platform_adapters: 11/11

## Out of scope

continuum's secondary observations:
1. `relay_ssh` should fail loudly when host_target is empty rather
   than silent no-op. Defense in depth — this PR fixes the upstream
   cause; failing-loudly downstream is an additional safety net.
2. `|| echo ""` patterns on get_config_val / get_name silently mask
   ANY exec failure (not just Store-stub). Worth reviewing each
   callsite; out of scope for this PR which fixes the immediate
   blocker.

Both filed as separate issues for follow-up.
…_* config write (continuum's retest) (#165)

fix(airc): two PR #164 followups — sed missed line 1372 + harden host_* config write

continuum-b69f's PR #164 retest 2026-04-27 found two remaining bugs:

## Bug A: sed missed `python3 -u -c '` at line 1372

PR #164's sed pattern was `python3 -c` — didn't match the `-u` flag
sandwiched between python3 and -c at line 1372 (monitor_formatter
unbuffered launch). On Windows Git Bash with the Microsoft Store
stub, this site silent-failed too: monitor_formatter crashed at
launch, the inbound stream went dark, joiner couldn't see anything
the host wrote. One-line fix: `python3 -u -c '` →
`"$AIRC_PYTHON" -u -c '`.

## Bug B: host_* config write silently no-op'd if ANY bash subst broke

continuum's joiner config showed `name`, `host`, `host_target`,
`created` but NOT `host_airc_home`, `host_name`, `host_port`,
`host_ssh_pub`, `host_identity` — all five fields written together
by the heredoc at line 2768.

Pre-fix:
```bash
HOST_IDENTITY="$host_identity_json" "$AIRC_PYTHON" -c "
import json, os
c = json.load(open('$CONFIG'))
c['host_airc_home'] = '$host_airc_home'
c['host_name']      = '$peer_name'
c['host_port']      = ${peer_port:-7547}
c['host_ssh_pub']   = '''$host_ssh_pub'''
...
" 2>/dev/null || true
```

Five bash substitutions into python source. If ANY substitution
breaks python parsing (newline in host_ssh_pub, special char in
host_airc_home, empty/non-numeric peer_port, etc.) the whole heredoc
crashes at parse time. `2>/dev/null || true` swallows the SyntaxError
and zero fields land. Five silently-empty config fields downstream:
- host_airc_home empty → cmd_send computes wrong remote path
- host_name empty → "Connected to ''" banner
- host_port wrong → SSH targets wrong port (or 7547 fallback)
- host_ssh_pub empty → host's SSH key not in authorized_keys
- host_identity empty → airc whois <host> shows (unset)

Post-fix: pass everything as env vars; python reads from os.environ.
Bash never touches the python source. Also emit stderr to a warn line
(not /dev/null) so the future debugger can see it. Also catch
ValueError on int(host_port) so a non-numeric value falls back to
7547 instead of dying.

## Pattern lesson

bash → python heredoc with bash variable substitution into the
python SOURCE is fragile. Any unusual byte in the variable can
break python parsing. Same shape as the resolver heredoc that
broke pre-PR #155 with set -e + pipefail.

Repeat-offender pattern. Consider a sweep: every `"$AIRC_PYTHON" -c
"..."` heredoc that contains `$bash_var` substitutions — convert to
env-var pass + os.environ. Out of scope for this PR (would touch
~30 sites); file as a separate canary follow-up.

## Test posture

Mac regression (5 scenarios, all green):
- identity 19/19
- whois 5/5
- part_persists 8/8
- list 4/4
- general_sidecar_default 12/12

End-to-end Win→Mac broadcast verification still pending continuum's
retest after pulling this fix.
…Phase 0) (#166)

Joel 2026-04-27: "3000 lines of code dear god" → "yes" (start the
architectural pivot to airc_core).

Today's session shipped 17 PRs, ~half fighting bash → python heredoc
fragility (silent SyntaxErrors, function-export leaks, missed sed
patterns, swallowed stderr). The pattern is the problem: bash
substituting variables INTO python source code is a per-site silent
fail. PR #164 fixed the export -f leak via AIRC_PYTHON; PR #165
hardened ONE heredoc with env-var pass; ~30 more heredocs remain.

This PR pivots: business logic moves to a Python truth-layer
package (airc_core/), bash + ps1 become thin shells that invoke
the Python via -m. Same input → same output → same testable code,
no more bash-into-python escaping.

## Phase 0: foundation

- `lib/airc_core/__init__.py` — package marker. v0.1.0.
- airc bash resolves the lib dir at startup (4 candidates, first hit
  wins; canonicalizes to absolute via cd+pwd so PYTHONPATH stays
  valid across cwd changes). Sets PYTHONPATH unconditionally.
- New debug command `airc debug-pythonpath` echoes the resolved
  path + tests `import airc_core` end-to-end.
- install.sh changes: none needed — the existing clone-everything
  shape already pulls lib/ along.

## Phase 0a: first function migrated

- `lib/airc_core/datetime.py` exposes `iso_to_epoch()` with a CLI
  entry: `python -m airc_core.datetime iso_to_epoch <ts>`.
- Bash `iso_to_epoch` shrinks from 22 lines (3-fallback adapter
  chain) to 4 lines (single Python module call).
- Test harness in scenario_platform_adapters updated to set
  AIRC_PYTHON + PYTHONPATH for the extracted-adapter shell so
  the test sees the Python module.

## Why iso_to_epoch as the first migration

- Pure logic, no I/O — easiest to verify identical behavior.
- Already adapter-fied in PR #151 (clean callsite contract).
- Three callsites downstream — proves the pattern works for both
  the function definition AND its consumers.
- Smallest possible blast radius if the pattern flubs.

## Test posture

- platform_adapters: 11/11 (was 11/11; iso_to_epoch trio still green
  through the migrated code path)
- part_persists: 8/8 (downstream consumer via heartbeat parse)
- list: 4/4 (downstream consumer via _format_relative_time)
- general_sidecar_default: 12/12 (sidecar spawn touches the path)

## Pattern for follow-up phases

Phase 0a establishes the shape. For each subsequent migration:

1. Identify a heredoc-heavy function in airc bash.
2. Re-implement the logic in airc_core/<module>.py with a CLI entry.
3. Bash function becomes a 1-line `"$AIRC_PYTHON" -m airc_core.<module> <subcommand> "$@"` call.
4. Run integration tests; verify identical bash-side behavior.
5. Same module is callable from airc.ps1 (Phase 2 — drift between
   bash and ps1 ports goes away mechanically).

Priority order for Phase 1 (high-fragility first):
- pair handshake JSON build/parse (~80 lines, env-var pass already
  partially done in #165)
- gist envelope build (host's response payload)
- gist envelope resolve (joiner's parse — the JSON-key-leak class)
- monitor_formatter (the long-running -u -c heredoc; missed by sed
  in #164, fixed in #165)
- host_address_set (network enumeration)
- config CRUD (45+ callsites; biggest dedupe but most plumbing)

## Out of scope for this PR

- No Phase 1 migrations land here. Joel reviews the SHAPE first.
- airc.ps1 still uses its own duplicate logic; that's Phase 2.
- The 30+ remaining heredocs in airc bash still exist; they'll
  migrate one at a time per the Phase 1 priority order.
…1) (#167)

feat(airc_core): migrate config CRUD (get_name, get_config_val) to airc_core.config (#152 Phase 1)

Continuing the Python truth-layer migration started in PR #166.
Phase 1: convert high-risk bash heredocs to airc_core modules
incrementally.

## What

- New `lib/airc_core/config.py` with `get(config_path, key, default)`
  + `get_name(config_path)` + CLI entry point.
- Bash `get_name` and `get_config_val` shrink from inline python
  heredocs (with bash-variable substitution INTO the python source)
  to one-line `"$AIRC_PYTHON" -m airc_core.config get <key> <default>`
  calls.

## Why

45+ callsites across airc bash use these two helpers. Pre-migration
each was an inline `"$AIRC_PYTHON" -c "import json; ...$1...$2..."`
heredoc — bash $1 / $2 substituted INTO the python source. If the
key or default contained quotes, special chars, etc., python parsing
broke silently and the value fell back via `2>/dev/null || echo $2`.
Continuum-b69f 2026-04-27 traced one symptom (host_target reading
empty even when config.json had it) to this class.

Now: CONFIG env var holds the file path; key + default come from
argv. Python source is fixed bytes; bash never touches it.

## Test posture

- identity: 19/19 (heaviest config-read scenario — name, identity
  fields, integrations all read via the migrated path)
- whois: 5/5
- part_persists: 8/8
- list: 4/4
- general_sidecar_default: 12/12
- platform_adapters: 11/11

Direct unit-test of the CLI:
- valid config → returns name correctly
- missing config → returns default
- get_name on valid config → name
- both subcommands respond as expected

## Next migrations

Per the Phase 1 priority queue (high-fragility first): pair
handshake JSON build/parse → gist envelope build → gist envelope
resolve → monitor_formatter → host_address_set. Each lands as a
separate PR; integration tests verify identical bash-side behavior.
…Phase 1) (#168)

Four field-extract sites for the host's handshake response (ssh_pub,
airc_home, identity, reminder) were inline `python3 -c "import sys,
json; print(json.load(sys.stdin).get('FIELD',''))"` heredocs. Same
class as get_config_val pre-PR #167 — bash variable substitution
into python source is a per-callsite silent-fail vector if the
embedded value drifts.

Now: response JSON via stdin; field name + default via argv.
Python source is fixed bytes.

## CLI shape

```
echo "$response" | "$AIRC_PYTHON" -m airc_core.handshake get_field <name> [default]
```

Handles dict / list values via json.dumps so callers can re-parse
(needed for the identity field, which is a nested object).

## Test posture

- identity: 19/19
- whois: 5/5
- part_persists: 8/8
- list: 4/4
- general_sidecar_default: 12/12
- kick: 12/12

Plus direct CLI unit tests (valid response, missing field with
default, nested object round-trip, empty stdin → default, garbage
input → default).

## What's left in handshake-related code

- Host's response BUILDER (line 3236, builds the JSON payload the
  joiner reads). Bash-substitutes name + airc_home + identity into
  python source. Same class. Migrate next.
- Joiner's payload BUILDER (line 2580, sends payload TO host).
  Same pattern; same class.

Both are smaller migrations following the same shape.
… heredocs (#152 Phase 1 cleanup) (#169)

feat(airc_core): collapse _whois_in_scope + resolve_name + cmd_rename heredocs into get_config_val[_in] (#152 Phase 1)

Cleanup pass following PR #167/#168. Eight more inline `python -c`
heredocs collapsed into one-line calls now that airc_core.config
handles the read pattern.

## Sites migrated

1. **resolve_name** (line 1228) — was duplicating the get_config_val
   logic inline. Now calls get_config_val.
2. **cmd_rename** (line 3369) — same.
3. **_whois_in_scope** (six sites) — host_name, host_identity,
   host_target (×2), host_airc_home, peer-file's identity, peer-file's
   host. All collapsed to get_config_val_in or airc_core.handshake
   get_field.

## New: get_config_val_in

Like get_config_val but reads from an arbitrary config.json path.
Used by _whois_in_scope's cross-scope walk (#134) which inspects
sibling scope state without changing $CONFIG. Same module, same CLI;
just different env var per call.

## airc_core.config: dict round-trip

Extended `get` to JSON-encode dict/list values (matches
handshake.get_field shape). Lets _whois_in_scope read host_identity
+ peer identity blobs as JSON-encoded strings that callers can
re-parse.

## Test posture

- whois: 5/5
- whois_cross_scope: 6/6  ← hottest path through _whois_in_scope
- identity: 19/19
- kick: 12/12
- part_persists: 8/8
- list: 4/4
- general_sidecar_default: 12/12

## Code reduction

~70 lines of inline python heredoc → ~10 lines of bash function
calls. Each removed heredoc was a separate silent-fail vector
(bash-substituted env var into python source code).

## Phase 1 progress

- ✓ iso_to_epoch (Phase 0a)
- ✓ config CRUD core (PR #167)
- ✓ handshake response parse (PR #168)
- ✓ _whois_in_scope + resolve_name + cmd_rename cleanup (this PR)
- next: handshake/gist envelope BUILD sites, identity show/set, monitor_formatter
… Phase 1) (#170)

The pair-handshake send was an inline `python -c` heredoc with FIVE
bash-variable substitutions into the python source — name, host,
ssh_pub, sign_pub, airc_home — plus the connect target as `('$peer_host_only', $peer_port)`.
Any unusual character in any field could silently break python parsing.
Specifically host_ssh_pub may contain a trailing newline (depending on
how openssh-keygen wrote the .pub file); host_target may contain
characters that need quoting; identity is a JSON-encoded blob of
arbitrary user-set text. Each was a per-callsite silent-fail.

## Migration

`airc_core.handshake.send(host, port)` reads all six fields from env:
MY_NAME, MY_HOST, MY_SSH_PUB, MY_SIGN_PUB, MY_AIRC_HOME, MY_IDENTITY.
Builds the JSON payload, opens TCP socket, sends, reads response,
returns it as a string. Exceptions surface to stderr (matches the
never-swallow-errors rule); bash captures stderr via `2>&1`.

Bash callsite shrinks from 23 lines (inline python heredoc) to 8
lines (env-var pass + module call):

    response=$(MY_NAME="$my_name" \
               MY_HOST="$(whoami)@$(get_host)" \
               MY_SSH_PUB="$my_ssh_pub" \
               MY_SIGN_PUB="$my_sign_pub" \
               MY_AIRC_HOME="$AIRC_WRITE_DIR" \
               MY_IDENTITY="$my_identity_json" \
               "$AIRC_PYTHON" -m airc_core.handshake send "$peer_host_only" "$peer_port" 2>&1) || _pair_ok=0

## Test posture

Pair-handshake exercising scenarios all green:
- tabs: 19/19 (two-tab pair on localhost — exercises send + receive)
- identity: 19/19 (exchange identity at handshake)
- whois: 5/5 (read identity from response)
- kick: 12/12 (multi-peer pairing)
- part_persists: 8/8 (sidecar + primary spawning)

## Phase 1 progress

- ✓ iso_to_epoch (Phase 0a, PR #166)
- ✓ config CRUD core (PR #167)
- ✓ handshake response parse (PR #168)
- ✓ _whois_in_scope cleanup (PR #169)
- ✓ joiner handshake send (this PR)
- next: host's response builder (line ~3236), self-heal/discovery
        heredocs, monitor_formatter
…-line migration, biggest single heredoc) (#171)

feat(airc_core): monitor_formatter → airc_core.monitor_formatter (#152 Phase 1, biggest single migration)

The biggest single heredoc in airc bash. ~250 lines of Python embedded
in a `"$AIRC_PYTHON" -u -c '...'` block, complete with apostrophe-
escape gymnastics like `caller'\''s` and `Joel'\''s` because bash
single-quoting required them. Migrated to a proper Python module.

## Impact

airc bash file: **5897 → 5647 lines** (−250 lines, ~4.2% reduction
of the entire script).

The migrated function had:
- Inactivity watchdog (cross-platform: SIGALRM on POSIX,
  threading.Timer on Windows)
- [rename] handler with chain-repair via stable host id
- Ping/pong control message handling with auto-pong subprocess.Popen
- Own-send filtering with mid-session rename support
- Inbound mirror-to-local-log for joiners (avoids feedback loop on
  hosts)
- Belt-and-suspenders error handling per line so one bad message
  doesn't kill the formatter

All preserved verbatim — same logic, same stdin/stdout contract.
The CLI shape:

    PEERS_DIR=<peers-dir> "$AIRC_PYTHON" -u -m airc_core.monitor_formatter <my_name>

Bash function shrinks to 4 lines (was 268).

## Why a real .py file matters here

The bash heredoc had:
- `'\''` shell-escape sequences scattered through comments (caller's
  → caller'\''s) — readable Python source now restores natural
  apostrophes.
- No editor syntax highlighting for python (it was inside a bash
  string).
- No way to unit-test individual functions (_rename_files,
  _find_peer_by_host) without invoking the whole bash + airc stack.

Now the module is a regular Python file: lints, syntax-highlights,
unit-testable, importable from other airc_core modules if needed.

## Test posture

84 assertions pass across 8 scenarios touching monitor_formatter
(every scenario that pairs + sends/receives):

- tabs: 19/19 (two-tab message exchange)
- identity: 19/19 (identity round-trip + rename)
- whois: 5/5 (host_identity propagation)
- part_persists: 8/8 (sidecar + primary monitor active)
- list: 4/4
- general_sidecar_default: 12/12
- kick: 12/12 (multi-peer monitor traffic)
- events: 5/5 (system-event formatting)

## Phase 1 progress

- ✓ iso_to_epoch (Phase 0a, PR #166)
- ✓ config CRUD core (PR #167)
- ✓ handshake response parse (PR #168)
- ✓ _whois_in_scope cleanup (PR #169)
- ✓ joiner handshake send (PR #170)
- ✓ monitor_formatter (this PR — biggest single migration)
- next: host's pair-handshake handler heredocs, smaller cleanups
…#152 Phase 1) (#172)

feat(airc_core): host pair-handshake accept_one → airc_core.handshake.accept_one (#152 Phase 1)

Symmetric counterpart of PR #170 (joiner send) — the HOST'S accept-
and-respond heredoc, biggest remaining bash-into-python heredoc with
substituted variables. 127 lines of Python with EIGHT bash variable
substitutions migrated to a clean Python module.

## Substitutions previously inline

- $host_port — the listen port (numeric, but bare-substituted)
- $PEERS_DIR — joiner's peer file path
- $(timestamp) — bash command-substitution INTO python (highest risk)
- $IDENTITY_DIR — host's ssh_key.pub source
- $CONFIG — host's identity load path
- $name — host's identity name
- $reminder_interval — numeric reminder interval
- $AIRC_WRITE_DIR — host's airc_home (sent in response)
- $MESSAGES — system-event log path

Each was a per-callsite silent-fail vector. Continuum traced the
write-side variant (#165) earlier today.

## Migration

`airc_core.handshake.accept_one()` reads all from env vars (HOST_PORT,
PEERS_DIR, IDENTITY_DIR, CONFIG, HOST_NAME, REMINDER_INTERVAL,
AIRC_WRITE_DIR, MESSAGES). Bash callsite shrinks from 127 lines
(heredoc body) to a 9-line env-var-pass + module call.

Same logic preserved verbatim — accept-with-timeout, parent-death
detection (`os.getppid() == 1`), authorize joiner SSH key, write peer
record (with stable-host stale cleanup), build response, write
peer-joined system event. The outer `while true; do ... done &`
bash loop unchanged.

## Impact

- airc bash: 5647 → 5529 (-118 lines)
- Cumulative today (Phase 1): ~370 lines moved out of bash to
  testable Python modules.

## Test posture (Mac, 89 assertions / 9 scenarios)

- tabs: 19/19 (two-tab pair on localhost — exercises full accept
  loop end-to-end)
- scope: 5/5 (multi-cwd pairing across scopes)
- identity: 19/19 (identity exchange at handshake)
- whois: 5/5
- kick: 12/12 (multi-peer, multiple accepts)
- part_persists: 8/8
- list: 4/4
- general_sidecar_default: 12/12
- events: 5/5 (peer-joined system event emission)

## Phase 1 progress

- ✓ iso_to_epoch, config CRUD, handshake parse, _whois cleanup,
  joiner send, monitor_formatter (PRs #166-#171)
- ✓ host accept_one (this PR)
- next: smaller cleanups (lan_ip resolver, identity/peer config writes,
  remaining gist-envelope bash heredocs)
…ibility (#152) (#173)

Joel 2026-04-27: "think my bigger issue is 5000 line files... like
straightforward programming... senior would have hit pause at 500."
Lesson saved (memory: flag file size proactively, threshold ~500 not
~5000). Starting Phase 3 — split airc bash into multiple files so
each is normal-software-shaped, not a giant monolith.

## What

`lib/airc_bash/platform_adapters.sh` (~158 lines, the existing
"Platform adapters" marked block from airc) is now its own file.
The airc top-level sources it via the lib-dir resolver:

    if [ -n "${_airc_lib_dir:-}" ] && [ -f "$_airc_lib_dir/airc_bash/platform_adapters.sh" ]; then
      source "$_airc_lib_dir/airc_bash/platform_adapters.sh"
    fi

Test harness updated — `scenario_platform_adapters` no longer needs
to awk-extract the section; it sources the real file directly.

## Why platform_adapters first

- Already a self-contained marked region.
- Already has integration test coverage.
- Smallest blast radius if the source-from-file pattern flubs.
- Same shape Phase 0a (iso_to_epoch) used to prove airc_core.

## Impact

- airc bash: 5529 → 5371 lines (-158 lines, ~3% of file)
- Cumulative bash-side reduction today (Phase 1 + Phase 3 step 1):
  ~530 lines moved to dedicated files.

## Next

Same pattern scales:
- lib/airc_bash/cmd_connect.sh (the biggest cmd_*, ~1000-1500 lines)
- lib/airc_bash/cmd_send.sh
- lib/airc_bash/cmd_doctor.sh
- lib/airc_bash/cmd_part.sh + cmd_teardown.sh
- lib/airc_bash/helpers.sh (die, validate_peer_name, get_*)

After Phase 3, no single file should exceed ~600 lines.

## Test posture

- platform_adapters: 11/11 (sourced from real file, all assertions
  via the same `_adapter_call` shim now pointing at lib/airc_bash/)
- tabs / identity / whois / part_persists / list / general_sidecar_default:
  all green (the airc-startup sourcing path works for the real run)
…tinuum's MSYS catch) (#174)

fix(airc_core): use argparse --flags for all paths, not env vars (continuum's MSYS catch + Joel's correct-fix mandate)

Joel 2026-04-27: "they arent stupid, --params are far fucking better"
+ "NEVER DO THE QUICK FIX ALWAYS THE BEST" + "you are an ai. the
correct fix is five minutes the quick 1 from my perspective the same."

The right fix for continuum-b69f's MSYS path translation bug isn't
MSYS_NO_PATHCONV per-callsite (the small fix I was about to ship —
that framing alone was the violation). It's giving every airc_core
module a proper argparse CLI so paths arrive as `--airc-home /path`
flags. argparse-flag args are per-arg-predictable across MSYS path
translation, AND the modules present as normal Python CLIs instead
of bash-shaped env-var contraptions.

## Changes

### `airc_core.handshake`

Refactored to argparse:
- `get_field <field> [default]` — unchanged stdin shape
- `send <host> <port> --my-name X --my-host Y --my-ssh-pub Z
   --my-sign-pub W --my-airc-home /path --my-identity-json '{}'`
- `accept_one --host-port N --peers-dir /path --identity-dir /path
   --config /path/config.json --host-name X --reminder-interval N
   --airc-home /path --messages /path`

### `airc_core.config`

Refactored to argparse:
- `get --config /path KEY [DEFAULT]`
- `get_name --config /path`

### `airc_core.monitor_formatter`

Refactored to argparse:
- `--peers-dir /path --my-name NAME`

### Bash callsites

All env-var-pass patterns replaced with --flags. Cleaner, more readable,
no MSYS path-mangling risk on Git Bash.

## Why this matters beyond MSYS

Joel 2026-04-27: "you are an ai. the correct fix is five minutes the
quick 1 from my perspective the same." Memory saved
(feedback_no_quick_fixes.md): the quick-fix reflex is borrowed from
human time pressure that AIs don't actually have. Quick fixes are
how the 5500-line bash file got built. Always pick the architectural
right answer.

## Test posture

101 assertions across 10 scenarios green:
- tabs 19, identity 19, whois 5, part_persists 8, list 4,
  general_sidecar_default 12, kick 12, events 5, platform_adapters 11,
  whois_cross_scope 6

Plus `--help` output for each module is now standard argparse format.
…#175)

feat(airc-bash): extract cmd_doctor + _doctor_* helpers (Phase 3 — airc under 5000)

435 lines (cmd_doctor, _doctor_detect_pkgmgr, _doctor_install_cmd_for,
_doctor_probe, _doctor_probe_gh_auth, _doctor_probe_sshd,
_doctor_probe_tailscale, _doctor_connect_preflight, _doctor_run_tests)
extracted to lib/airc_bash/cmd_doctor.sh, sourced from airc top-level
via the lib-dir resolver.

## Impact

- airc bash: 5386 → 4952 lines. **Below 5000 for the first time today.**
- New file: 435 lines, self-contained.

## Why doctor was a clean candidate

- All `_doctor_*` helpers used by cmd_doctor only — no exterior consumers.
- The probes use `detect_platform` / `get_config_val` from airc top-level
  (resolver sources platform_adapters before this file, and config CRUD
  helpers are still in airc).
- Already organized as a marked logical section.

## Live verify

`airc doctor` on this Mac: all probes [ok]. git, gh, gh authenticated,
openssl, ssh, ssh-keygen, python3, jq, sshd, tailscale — all green via
the sourced file.

## Test posture (66 assertions / 6 scenarios)

- tabs 19, identity 19, whois 5, part_persists 8, list 4,
  general_sidecar_default 12

## Remaining biggest sections in airc

- cmd_connect (~1500 lines) — still in airc, biggest remaining slice
- cmd_send (~300 lines)
- cmd_part / cmd_teardown (~250 combined)
- gist envelope build (~200)

Continued split brings each below the ~500 threshold Joel called out.
…(continuum's #174 follow-up) (#176)

fix(airc_core): config set_host_block subcommand — last env-var-pass site converted to argparse (continuum's PR #174 follow-up)

continuum-b69f's #174 retest 2026-04-27 found that PR #174 missed the
host_* config WRITE site (the post-handshake "store host details"
block). It still used env vars, so MSYS path-translated $host_airc_home
on Git Bash before python read it from os.environ. Same silent-fail
class as the rest of #174.

## Fix (matches PR #174 pattern verbatim)

New subcommand: `airc_core.config set_host_block`.

```bash
"$AIRC_PYTHON" -m airc_core.config set_host_block \
    --config "$CONFIG" \
    --host-airc-home "$host_airc_home" \
    --host-name "$peer_name" \
    --host-port "${peer_port:-7547}" \
    --host-ssh-pub "$host_ssh_pub" \
    --host-identity-json "$host_identity_json"
```

Bash callsite is one airc_core invocation; no env-var pass; no python
heredoc with bash substitutions; no `2>/dev/null` swallowing errors.
The CLI errors are surfaced via stderr per the never-swallow-errors
rule.

## Why this matters per Joel's "always the right fix"

PR #174 was the right approach for the SEND/PARSE/ACCEPT sites. PR
#165 (env-var hardening) was a defensive partial fix at the WRITE
site. Today we close the loop — same architecture across all
config-mutating sites.

## Test posture

95 assertions / 9 scenarios green:
- tabs 19, identity 19, whois 5, part_persists 8, list 4,
  general_sidecar_default 12, kick 12, events 5, platform_adapters 11

Unit test:
- set_host_block writes valid JSON with all fields preserved
  uncorrupted (path / SSH pubkey / identity dict round-trip)
…nd-to-end (#177)

fix(msys): export MSYS2_ARG_CONV_EXCL at airc startup — last layer of cross-machine fix (continuum's catch + verify)

continuum-b69f's diagnosis 2026-04-27: even with PR #174 + #176's
argparse `--flags`, MSYS Git Bash on Windows translates argv VALUES
that look like Unix-rooted paths when bash invokes a Windows-native
binary. So `--host-airc-home /Users/joelteply/.airc` arrived at
python.exe as `--host-airc-home C:/Program Files/Git/Users/joelteply/.airc`,
the joiner cached the corrupted path, the SSH command later sent
it to a real Unix host that had no such file. Silent broadcast
failure.

## Fix

```bash
export MSYS2_ARG_CONV_EXCL="${MSYS2_ARG_CONV_EXCL:-/Users/;/home/;/root/}"
```

Set once at airc startup, exported, every airc_core invocation
inherits the same translation policy. Targeted prefix list covers
macOS / Linux / root home prefixes without breaking `/tmp/` or `/c/`
paths (which DO need translation for `--config "$CONFIG"` where
$CONFIG is on the local Windows filesystem). Honors a user override
via the `${...:-...}` default-fallback.

## End-to-end verification

continuum-b69f shipped a test broadcast from Windows after their
local patch:

> WORKING TEST: Windows-Mac airc msg via continuum-msyspatch with
> targeted MSYS exclude. should land!

Verified live in MY host's messages.jsonl on Mac:

```
{"from":"continuum-msyspatch","to":"all","ts":"2026-04-27T23:30:13Z",
 "msg":"WORKING TEST: Windows-Mac airc msg via continuum-msyspatch
 with targeted MSYS exclude. should land!","sig":"..."}
```

**Cross-machine Mac↔Windows airc end-to-end working.** This was the
last bug in the chain that started at PR #153 (Microsoft Store
python3 stub).

## Test posture (Mac, where the env var is a no-op)

- tabs 19/19, identity 19/19, whois 5/5, part_persists 8/8,
  list 4/4, general_sidecar_default 12/12, kick 12/12, events 5/5,
  platform_adapters 11/11, whois_cross_scope 6/6

## Today's full chain of cross-machine fixes

#153#154#155#156#157#158#159#160#162#164#165#166 → ... → #176 → this. 27+ PRs to ship a working
cross-machine airc on Windows. Every step revealed a new layer.
…lent-drop catch) (#178)

fix(encoding): export PYTHONIOENCODING=utf-8 at airc startup (continuum's encoding-drop catch)

continuum-b69f traced 2026-04-27: many cross-machine messages were
getting SILENTLY DROPPED on Windows with:

    [airc:formatter] skipped one line: 'charmap' codec can't encode
    character '→' in position 37: character maps to <undefined>

Windows Python defaults to the local code page (cp1252 on US/EU
installs) for stdout. Common Unicode chars — →, em-dash, ✓, etc. —
have no cp1252 codepoint, so `print(...)` raises UnicodeEncodeError.
The formatter's per-line try/except catches it and skips, but from
the user's view the message is just missing from the stream.

## Fix

```bash
export PYTHONIOENCODING="${PYTHONIOENCODING:-utf-8}"
```

Set once at airc startup. Every Python subprocess airc spawns
inherits utf-8 stdio. Honors user override via the default-fallback.
Same shape as MSYS2_ARG_CONV_EXCL (#177) — environment-level fix
that benefits every airc_core invocation without per-callsite changes.

## Why this is the right shape (per Joel's "always the best")

Per-module sys.stdout reconfiguration is also possible, but:
- Requires editing every airc_core module
- Easy to miss a future module
- Doesn't help bash-side code that might also print Unicode

Setting PYTHONIOENCODING once at airc startup is the architectural
answer — Python is told globally to use utf-8 for stdio, and every
subprocess gets the right behavior automatically.

## Test posture

10 scenarios / 102 assertions green on Mac (env var is no-op on Mac
where Python defaults to utf-8 already, but the export is harmless).
Live python3 print of `→ ✓ ⚠ — em-dash` succeeds with the env var set.

## Follow-up

Closes the silent-drop class continuum filed earlier today as #163
(UTF-8 → Latin-1 double-decode). The PYTHONIOENCODING fix is more
general — it covers the OUTPUT side (Windows console encoding) AND
the INPUT side (Python reading stdin will also use utf-8). #163 can
be closed.
…lag (#179) (#183)

Three entangled fixes for the multi-scope rename bug filed by vhsm-d1f4
+ ideem-local-4bef on 2026-04-28:

1. cmd_rename now writes the new name to ALL scopes' config.json
   (primary + sidecars), not just the current scope. Reorder so config
   writes happen BEFORE the broadcast: cmd_send may die() (exit 1) when
   the scope's monitor is down, so a broadcast failure can't prevent
   propagation if propagation runs first.

2. cmd_send takes a new --internal flag for informational broadcasts
   ([rename], etc). When the monitor is down, --internal callers append
   to the local log and return 0 instead of die()ing. The monitor-down
   die is appropriate UX for explicit `airc send` (surfaces "you're
   broadcasting to nobody"), but wrong for [rename] — receivers heal
   via monitor_formatter's host-fallback on next traffic regardless.

3. cmd_rename's recursion guard moves from AIRC_RENAME_NO_PROPAGATE env
   var to a --no-propagate flag. Plus a new airc_core.config set_name
   subcommand replaces the inline-Python heredoc that was quoting-
   fragile. All params are now --flag form, consistent with the rest
   of the airc CLI surface (per README convention).

Test fixture verifies: primary→sidecars, sidecar→primary, three-scope
fan-out, --no-propagate guard, --help/missing-name UX. Integration
suite passes — same 3 pre-existing flakes as canary, no regressions
(180→181 passing).

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… (#185)

When the airc parent bash dies (terminal close, kill, Monitor tool
teardown), the accept-loop subshell reparents to init but stays alive,
re-spawning fresh python listeners every iteration. Each listener's
own getppid() points at the orphaned bash subshell — never at init —
so the existing `getppid()==1` socket-timeout check never fires.

Result: orphan listeners hold the host port, accept incoming pair
handshakes, write peer records, and stuff joiner SSH keys into
authorized_keys — pointing at a dead host with no relay behind it.
This is the cause of the integration suite's "port still held after
teardown" + "alpha still listening" flakes.

Two-layer fix:

1. Bash accept loop: `while kill -0 PARENT` instead of `while true`.
   Captures airc bash's PID at startup; loop exits the moment that
   PID disappears, no fresh python is spawned past that point.

2. Python listener: --watch-pid flag wires the same airc bash PID
   into a daemon thread that polls os.kill(pid, 0) every second.
   When the parent dies, os._exit(0) breaks out of any in-flight
   accept()/recv() — covers the in-handshake case the bash check
   misses while a python is mid-iteration.

Both layers watch the SAME PID (airc bash), not their immediate
parent, because the immediate parent (accept-loop subshell) outlives
airc bash by one iteration in the orphan scenario.

Verified:
- Orphan repro: SIGKILL airc bash → python exits via parent-watch
  within 1s, port freed (was: ghost listener + held port forever).
- airc teardown still works (watch-pid is opt-in via --watch-pid 0).
- Integration suite: 183 passing (vs 180 baseline on canary). Two
  long-standing flakes resolved: "port 7549 still held after teardown"
  + "alpha still listening after teardown". One remaining flake
  ("beta did NOT successfully pair") is unrelated — different scenario.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* ci: clean-install matrix (linux + macos + windows + windows-ps5)

Joel asked 2026-04-28: "guarantee clean mac and windows installs work,
and as much of this as possible is fixed... CI after fixing what we deem
important for release."

Three concurrent jobs on every PR + every push to canary/main:

- clean-install-linux:    ubuntu install.sh + airc doctor + smoke (host
                          stays up, teardown clean).
- clean-install-macos:    macos install.sh + same smoke.
- clean-install-windows:  windows install.ps1 (pwsh) + airc doctor.
- clean-install-windows-ps5: install.ps1 under Windows PowerShell 5.1 —
                          the default that ships with Windows. Catches
                          regressions like #91 (bootstrap fails under
                          5.1 because airc.ps1 has #Requires -Version
                          7.0).

Plus, on push to canary/main only (not PRs — rate limits + flaky network):

- integration-suite:      full test/integration.sh on ubuntu. The heavy
                          gate; serves as the canary→main green signal.

Concurrency group cancels superseded runs on the same ref. PR jobs run
on every push to the PR branch.

Until the open Windows install issues land (#91, #94, #95, #96, #97,
#98, #99, #152), the windows jobs treat `airc doctor` failures as
non-fatal — the install + bin-discovery itself still validates,
and we'll tighten to hard-fail once those are resolved.

Open issues this CI surface:
- #91 — bootstrap PS 5.1 (clean-install-windows-ps5)
- #94 — Tailscale winget package ID typo (install)
- #96 — install.ps1 doesn't install OpenSSH Server
- #98 — install.ps1 leaves DefaultShell unconfigured
- #152 — airc.ps1 ~20 commits behind canary

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ci: smoke uses airc.pid not pgrep argv; doctor exit clears LASTEXITCODE

pgrep -f 'airc connect ...' didn't match the actual argv 'bash /path/to/airc
connect ...' on the runners. Switch to checking airc.pid which is
canonical (and what airc teardown itself reads).

For Windows: PS try/catch doesn't trap native exit codes — airc doctor
exited 1 because gh wasn't authed and tailscale wasn't installed (both
expected in CI), but the catch never fired. Run airc doctor directly,
log the LASTEXITCODE if non-zero, then explicitly exit 0 so the step
treats it as informational (the install + bin-discovery is what we're
gating on right now).

* ci: macOS smoke uses airc.pid too (was left on old pgrep code)

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ultShell + Get-RemoteHome (#94, #98, #99) (#187)

* ci: real install path — drop AIRC_SKIP_PREREQS, hard-fail on doctor errors

The skip-prereqs variant only validated the wiring (clone + symlink +
PATH), not that install.{sh,ps1} can actually install everything missing
on a stock runner. As Joel put it 2026-04-28: "need to get all installs
working e2e or whats the point of a repo?"

Changes:
- linux + macos: drop AIRC_SKIP_PREREQS, drop sudo apt-get prereq
  preinstall; install.sh must handle it.
- windows pwsh + windows PS 5.1: drop AIRC_SKIP_PREREQS; install.ps1
  must handle the winget bootstrap.
- airc doctor: hard-fail on non-zero exit. Was non-fatal during the
  initial wiring-test phase; now that real install is exercised, doctor
  must report environment-clean for the job to pass.

This will surface the real Windows install issues (#91, #94, #96, #98,
#99, #152) as CI failures so we can fix them with confidence. May also
surface Linux/macOS prereq gaps that the skip-prereqs variant masked.

* fix(install): Tailscale winget id case (#94); doctor exits 0 (informational)

#94: install.ps1 uses 'tailscale.tailscale' (lowercase). winget --exact
is case-sensitive, returns "No package found", install loop swallows
the error as non-fatal, and the post-install probe reports "install
completed but probe still fails." Result: every Windows install lacks
Tailscale, even though the install log claims otherwise. Also fixed
the same lowercase id in airc.ps1's user-facing fix-hint messages
(line 328, 1293, 1414, 1420).

Doctor: airc.ps1's Invoke-Doctor leaks $LASTEXITCODE from external
probes (`& gh auth status` etc), so the script's natural-end exit
picks up whatever the last external returned — typically 1 on a
fresh / CI install where gh isn't authed. Bash doctor (cmd_doctor.sh)
just sets a counter and prints a summary, no exit, which is the
documented contract for the default `airc doctor` (informational,
like `git status`). The hard-fail gate is `airc doctor --connect`
(#80), which is the documented preflight before connecting. Match
the contract: explicitly set $LASTEXITCODE = 0 at the end of the
default doctor.

Bonus: .gitignore now excludes __pycache__/ + *.pyc — they leaked
through earlier when running airc_core CLIs locally during testing.

* fix(install.ps1): explicit exit 0 — `tailscale status` leaked LASTEXITCODE=1

Same pattern as the airc.ps1 doctor leak: external probes (notably
`tailscale status` when the user hasn't logged in yet — a normal
post-install state) leave $LASTEXITCODE non-zero, and PowerShell's
script natural-end exit picks it up. Every clean install on a fresh
runner / VM exited 1 even though the install fully succeeded.

Explicit `exit 0` after the final guidance banner.

* ci: re-trigger after macOS job hung overnight

* fix(windows): DefaultShell=bash (#98) + Get-RemoteHome forward-slash (#99)

Two tightly-coupled fixes that together make Windows airc HOSTS actually
work end-to-end. Without these, every Windows-hosted room failed the
moment a peer tried to send a message.

#98 — install.ps1: Set-OpenSSHDefaultShellBash
  Windows OpenSSH defaults DefaultShell to cmd.exe. cmd.exe lacks
  `cat`, POSIX redirects, and the rest of the shell vocabulary that
  airc remote commands rely on (`cat >> $rhome/messages.jsonl && echo
  __APPENDED__`, etc.). Without this fix, every airc msg from a peer
  to a Windows host silently fails — the cmd.exe error goes to ssh
  stderr (which `airc send` looks at, but only for specific patterns),
  the message gets [QUEUED] forever, the user sees nothing.

  Locate Git for Windows bash.exe, write to HKLM:\SOFTWARE\OpenSSH\
  DefaultShell. Idempotent — only writes when the registry value
  differs. Falls through with a loud warning if bash.exe can't be
  found (Git for Windows is already a hard prereq, so this should
  never fire in the install.ps1 flow).

#99 — airc.ps1: Get-RemoteHome forward-slash conversion
  The host_airc_home config value is captured as a Windows path with
  backslashes ('C:\Users\Administrator\Documents\Cambrian\.airc').
  When interpolated into an SSH remote command and the remote shell
  is bash (which #98 ensures), bash interprets the backslashes as
  escape characters and strips them — producing garbage like
  'C:UsersAdministratorDocumentsCambrian.airc'. The redirect target
  becomes a non-existent relative path and `cat >>` silently fails.

  Forward-slash form ('C:/Users/.../.airc') is interpreted correctly
  by bash as an absolute path; Windows kernel32 accepts forward
  slashes everywhere it accepts backslashes, so the on-disk write
  on the host succeeds.

Closes #98, #99. Together with #94 (Tailscale typo, already in this
PR) the install.ps1 → airc.ps1 path is now end-to-end functional on
a clean Windows install.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(install.ps1): ASCII-ify em-dashes — PS 5.1 reads UTF-8-without-BOM as cp1252

PS 5.1's parser barfed on em-dashes (U+2014 = 0xE2 0x80 0x94 in UTF-8
which Windows-1252 misreads) inside double-quoted strings in the new
Set-OpenSSHDefaultShellBash function. Pre-existing em-dashes in comments
have been there a while and passed because comment parsing is more
tolerant; new ones in expandable strings broke the parse.

Replaced all em-dashes in install.ps1 with ASCII '--'. install.ps1 is
the bootstrap script — must work from default Windows PowerShell 5.1
where the user lands by default, and that means staying ASCII-clean.

(airc.ps1 is fine — it's #Requires -Version 7.0 so PS 5.1 won't parse
it; pwsh handles UTF-8 without BOM correctly.)

* fix(install.sh): auto-skip sshd setup when CI=true (macOS hangs forever)

macOS install.sh _ensure_sshd_running falls through to osascript
'do shell script with administrator privileges' when no TTY is
attached (CI runners). osascript opens a GUI admin prompt waiting
for password / Touch ID — there's nobody home in CI, so it hangs
forever and the runner job silently consumes its full 6-hour timeout.

Auto-detect CI=true (GitHub Actions, GitLab, Travis, CircleCI, Jenkins,
etc. all set it) and skip the sshd setup block when present. Same
effect as AIRC_SKIP_SSHD=1 but no manual env-var wiring per workflow.

The hang manifested in PR #187's macOS job — install.sh was visibly
stuck in 'Stage install.sh + run' for 5+ minutes with no progress
while the linux + windows jobs completed in under a minute.

* ci(install.sh): also skip Tailscale install when CI=true (it's optional)

brew install --cask tailscale on macos-latest runners is multi-minute
(download + GUI app install). Tailscale is documented as optional
(LAN mesh works without it) and there's no tailnet behind the CI
runner. Same CI=true gate as the sshd skip.

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…#189)

Two bugs Joel reported in #184 (high severity, violates CLAUDE.md
"never swallow errors"):

PART 1 (joiner-side, well-understood): the escalation banner before
exit-99 was stderr-only. Monitor-style stdout-only consumers (Claude
Code Monitor tool, integration tests, simple `airc join | tee log`)
got a silent disconnect with zero diagnostic on their primary surface.

Fix: print escalation to BOTH stdout (single-line, parseable) and
stderr (multi-line, banner-style, log-friendly). The stdout line uses
the standard `airc:` prefix consumers already filter on.

Daemon-aware: detect whether `airc daemon install` has been run; tell
the user explicitly whether the upcoming exit-99 will trigger self-heal
(daemon present → launchd/systemd respawn) or just kill the relay
(no daemon → user must `airc join` again, hint to install daemon for
auto-recovery). New helper `_daemon_installed` checks for the launchd
plist or systemd user unit on disk — sibling to the existing
cmd_daemon_status logic.

PART 2 (host-side, unconfirmed): Joel observed the host monitor
silently exit despite the loop being `while true; ... || true; sleep
1; done`. Root cause unidentified (re-exec subprocess plumbing? signal
trap leak?). Add a loud diagnostic AFTER the while-true so any future
fall-through leaves evidence:

  echo "airc: host monitor loop exited unexpectedly — restart with: airc join"

Diagnostic, not a fix — but it satisfies "never swallow errors" while
the root cause is being hunted. Closes the joiner-side half of #184;
host-side stays open for further diagnosis.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…#142) (#190)

vhsm-d1f4 QA 2026-04-27: airc list shows several days of stale 1:1
invites cluttering the active-rooms count. Pre-fix #82 added "(stale)"
annotation but the entries still printed by default — for an active
user with several rooms across several days of test runs, the stale
count dominated the output.

New behavior matches the issue's preferred resolution (#142 option 3,
matching the existing peer-prune pattern):

- Default: skip stale items. Header shows count of active +
  parenthesized hint that stale ones are hidden + how to see them.
- --all / --include-stale: show all (the pre-#142 behavior).
- --prune: delete stale gists from gh, idempotent (skips fresh).

Header is also more informative: was "$count open on your gh account",
now "$fresh active on your gh account ($stale stale hidden — see
'airc list --all')" when there are stale entries to surface the
hidden state.

--prune is the symmetric verb to airc peers --prune (already exists),
matches the issue's option 3 preference.
Copilot AI review requested due to automatic review settings April 28, 2026 03:13
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This release bundle promotes canary → main by introducing a Python “truth-layer” (lib/airc_core/*), refactoring bash into sourced modules (lib/airc_bash/*), and hardening cross-platform install/CI—especially for Windows end-to-end setup and cross-machine messaging reliability.

Changes:

  • Added airc_core Python CLIs (config CRUD, handshake, datetime, monitor formatter) and wired airc to invoke them via AIRC_PYTHON + PYTHONPATH.
  • Extracted bash adapters/doctor into lib/airc_bash/* and updated integration tests accordingly.
  • Added/updated install paths (bash + PowerShell), plus a multi-OS clean-install CI matrix and integration-suite gating.

Reviewed changes

Copilot reviewed 13 out of 14 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
airc Switches to AIRC_PYTHON, sets MSYS/encoding env, sources extracted bash libs, and routes core logic through airc_core CLIs.
lib/airc_core/config.py New argparse-driven config.json read/write helpers (get, get_name, set_name, set_host_block).
lib/airc_core/datetime.py New Python ISO→epoch conversion used by bash adapters.
lib/airc_core/handshake.py New joiner/host handshake implementation + field extraction CLI.
lib/airc_core/monitor_formatter.py New Python monitor formatter module replacing embedded heredoc.
lib/airc_core/__init__.py Introduces airc_core package metadata.
lib/airc_bash/platform_adapters.sh Extracted cross-platform adapters; iso_to_epoch now delegates to Python.
lib/airc_bash/cmd_doctor.sh Extracted airc doctor with expanded probes (jq/sshd/tailscale path resolution).
test/integration.sh Updates platform adapter sourcing and adds deterministic iso_to_epoch assertions.
install.sh Adds jq prereq + sshd enablement and CI-specific skips; updates prereq probing.
install.ps1 Adds jq, OpenSSH Server install/start, DefaultShell→bash, and explicit successful exit.
airc.ps1 Fixes winget Tailcale ID casing, remote-home path slash normalization, and $LASTEXITCODE leak in doctor.
.github/workflows/ci.yml Adds clean-install matrix (linux/macos/windows/powershell5) + push-only integration suite.
.gitignore Ignores Python bytecode artifacts.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread install.ps1
} catch { }
if ($needRegWrite) {
Write-Host ' Disabling HNS auto-exclusion (HKLM\...\hns\State EnableExcludedPortRange = 0) ...'
& reg add 'HKLM\SYSTEM\CurrentControlSet\Services\hns\State' /v 'EnableExcludedPortRange' /d 0 /f 2>$null | Out-Null
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set-HnsPortFreedomFor22 uses reg add ... /d 0 without specifying the value type. If EnableExcludedPortRange is missing, reg add defaults to creating a REG_SZ, which may not be honored where a DWORD is expected. Prefer setting it via PowerShell with an explicit DWORD type (e.g., New-ItemProperty/Set-ItemProperty with -Type DWord) or add /t REG_DWORD to the reg.exe call.

Suggested change
& reg add 'HKLM\SYSTEM\CurrentControlSet\Services\hns\State' /v 'EnableExcludedPortRange' /d 0 /f 2>$null | Out-Null
& reg add 'HKLM\SYSTEM\CurrentControlSet\Services\hns\State' /v 'EnableExcludedPortRange' /t REG_DWORD /d 0 /f 2>$null | Out-Null

Copilot uses AI. Check for mistakes.
Comment thread airc
Comment on lines +90 to +91
for _candidate in \
"${AIRC_DIR:-}/lib" \
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In _airc_resolve_lib_dir(), the first candidate is "${AIRC_DIR:-}/lib". When AIRC_DIR is unset, that expands to "/lib", which is a real system directory and could (in rare cases) cause you to pick up an unrelated /lib/airc_core if present, or at least do an unnecessary filesystem probe. Consider only adding the AIRC_DIR-derived candidate when AIRC_DIR is non-empty (or default it to the intended install dir like $HOME/.airc-src).

Suggested change
for _candidate in \
"${AIRC_DIR:-}/lib" \
if [ -n "${AIRC_DIR:-}" ]; then
_candidate="$AIRC_DIR/lib"
if [ -d "$_candidate/airc_core" ]; then
# Canonicalize to absolute path so PYTHONPATH stays valid even
# if cwd changes mid-script (heredocs that cd elsewhere). cd +
# pwd is the portable canonicalize idiom — `realpath` and
# `readlink -f` are not available everywhere (BSD readlink
# lacks -f, busybox lacks realpath).
_abs=$(cd "$_candidate" 2>/dev/null && pwd) || _abs="$_candidate"
printf '%s' "$_abs"
return 0
fi
fi
for _candidate in \

Copilot uses AI. Check for mistakes.
Comment thread .github/workflows/ci.yml
Comment on lines +3 to +7
# Three jobs on every PR + every push to canary/main:
#
# clean-install-linux ubuntu install.sh + airc doctor + smoke test
# clean-install-macos macos install.sh + airc doctor + smoke test
# clean-install-windows windows install.ps1 + airc doctor (PS-side)
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Workflow header comment says “Three jobs on every PR + every push”, but the workflow now defines four clean-install jobs (linux/macos/windows/windows-ps5) plus integration-suite on push. Update the comment block to match the actual job set so future edits don’t drift based on incorrect docs.

Suggested change
# Three jobs on every PR + every push to canary/main:
#
# clean-install-linux ubuntu install.sh + airc doctor + smoke test
# clean-install-macos macos install.sh + airc doctor + smoke test
# clean-install-windows windows install.ps1 + airc doctor (PS-side)
# Four clean-install jobs on every PR + every push to canary/main:
#
# clean-install-linux ubuntu install.sh + airc doctor + smoke test
# clean-install-macos macos install.sh + airc doctor + smoke test
# clean-install-windows windows install.ps1 + airc doctor (PS-side)
# clean-install-windows-ps5 windows install.ps1 + airc doctor (PS 5.1)

Copilot uses AI. Check for mistakes.
Comment thread install.sh Outdated
Comment on lines +335 to +342
# Strict probe: presence on PATH AND a successful --version invocation.
# The bare `command -v` form is fooled by Windows's Microsoft Store
# python3.exe alias (continuum-b69f, 2026-04-27) — the file exists,
# satisfies command -v, but exits 49 with a Store-redirect message
# when actually run. Pre-fix: install printed "All required prereqs
# present" and airc later silent-fail-cascaded at every python3 -c
# invocation. Strict probe catches this at install time.
if ! command -v "$cmd" >/dev/null 2>&1 || ! "$cmd" --version >/dev/null 2>&1; then
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ensure_prereqs() is using a strict --version probe for all prereqs. Several of these tools (notably ssh-keygen on macOS/BSD) do not support --version and will exit non-zero even when correctly installed, which will make install.sh incorrectly treat them as missing and try to reinstall/fail. Consider limiting the strict probe to python/python3 (the Windows Store stub case) and using command -v (or a tool-appropriate version flag) for the rest.

Suggested change
# Strict probe: presence on PATH AND a successful --version invocation.
# The bare `command -v` form is fooled by Windows's Microsoft Store
# python3.exe alias (continuum-b69f, 2026-04-27) — the file exists,
# satisfies command -v, but exits 49 with a Store-redirect message
# when actually run. Pre-fix: install printed "All required prereqs
# present" and airc later silent-fail-cascaded at every python3 -c
# invocation. Strict probe catches this at install time.
if ! command -v "$cmd" >/dev/null 2>&1 || ! "$cmd" --version >/dev/null 2>&1; then
# Only python3 gets a strict execution probe. The bare `command -v`
# form is fooled by Windows's Microsoft Store python3.exe alias
# (continuum-b69f, 2026-04-27) — the file exists, satisfies
# command -v, but exits 49 with a Store-redirect message when
# actually run. Other tools are checked for presence on PATH only,
# because some valid platform variants (notably ssh-keygen on
# macOS/BSD) do not support `--version`.
local cmd_missing=0
case "$cmd" in
python3)
if ! command -v "$cmd" >/dev/null 2>&1 || ! "$cmd" --version >/dev/null 2>&1; then
cmd_missing=1
fi
;;
*)
if ! command -v "$cmd" >/dev/null 2>&1; then
cmd_missing=1
fi
;;
esac
if [ "$cmd_missing" -eq 1 ]; then

Copilot uses AI. Check for mistakes.
Comment on lines +60 to +84
# Cross-platform watchdog. POSIX (mac/linux/WSL) gets signal.SIGALRM
# which is cheaper (single-thread, kernel-armed). Windows Python has
# no SIGALRM so we fall back to threading.Timer — same exit semantics,
# slight overhead from the timer thread. Either way the fmt_exit=2
# contract is preserved.
try:
signal.signal(signal.SIGALRM, _watchdog_exit)
signal.alarm(WATCHDOG_SEC)

def _arm_watchdog():
signal.alarm(WATCHDOG_SEC)
except (AttributeError, ValueError):
import threading

_wd_timer_holder = [None]

def _arm_watchdog():
if _wd_timer_holder[0] is not None:
_wd_timer_holder[0].cancel()
t = threading.Timer(WATCHDOG_SEC, _watchdog_exit)
t.daemon = True
t.start()
_wd_timer_holder[0] = t

_arm_watchdog()
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

monitor_formatter arms the watchdog (SIGALRM handler / Timer thread) at import time. This means any import of airc_core.monitor_formatter in another module or in tests will mutate global process signal handlers / spawn a timer thread, which is a surprising side effect and can break embedding. Consider moving watchdog setup into run() (and teardown on exit) so importing the module is side-effect-free.

Copilot uses AI. Check for mistakes.
Comment on lines +98 to +112
try:
os.rename(old_json, new_json)
d = json.load(open(new_json))
d["name"] = new
json.dump(d, open(new_json, "w"), indent=2)
except Exception:
pass
old_pub = os.path.join(peers_dir, f"{old}.pub")
new_pub = os.path.join(peers_dir, f"{new}.pub")
if os.path.isfile(old_pub):
try:
os.rename(old_pub, new_pub)
except Exception:
pass
return True
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_rename_files() returns True even if the rename/update fails (exceptions are swallowed and the function unconditionally returns True once the old file exists). That can cause _handle_rename() to print “nick old → new” and suppress further processing even though the peer files were not actually renamed/updated. Return False when os.rename/JSON update fails, and avoid swallowing exceptions that leave the peer directory in a partially-updated state.

Suggested change
try:
os.rename(old_json, new_json)
d = json.load(open(new_json))
d["name"] = new
json.dump(d, open(new_json, "w"), indent=2)
except Exception:
pass
old_pub = os.path.join(peers_dir, f"{old}.pub")
new_pub = os.path.join(peers_dir, f"{new}.pub")
if os.path.isfile(old_pub):
try:
os.rename(old_pub, new_pub)
except Exception:
pass
return True
old_pub = os.path.join(peers_dir, f"{old}.pub")
new_pub = os.path.join(peers_dir, f"{new}.pub")
tmp_json = f"{new_json}.tmp"
original = None
json_renamed = False
pub_renamed = False
try:
with open(old_json) as f:
original = json.load(f)
os.rename(old_json, new_json)
json_renamed = True
updated = dict(original)
updated["name"] = new
with open(tmp_json, "w") as f:
json.dump(updated, f, indent=2)
os.replace(tmp_json, new_json)
if os.path.isfile(old_pub):
os.rename(old_pub, new_pub)
pub_renamed = True
return True
except Exception:
if os.path.exists(tmp_json):
os.unlink(tmp_json)
try:
if pub_renamed and os.path.isfile(new_pub):
os.rename(new_pub, old_pub)
if json_renamed and os.path.isfile(new_json):
os.rename(new_json, old_json)
if original is not None and os.path.isfile(old_json):
with open(tmp_json, "w") as f:
json.dump(original, f, indent=2)
os.replace(tmp_json, old_json)
except Exception:
if os.path.exists(tmp_json):
os.unlink(tmp_json)
raise
return False

Copilot uses AI. Check for mistakes.
Comment on lines +171 to +181
# Authorize joiner's SSH key.
ssh_dir = os.path.expanduser("~/.ssh")
os.makedirs(ssh_dir, mode=0o700, exist_ok=True)
ak = os.path.join(ssh_dir, "authorized_keys")
ssh_key = joiner.get("ssh_pub", "")
if ssh_key:
existing = open(ak).read() if os.path.exists(ak) else ""
if ssh_key not in existing:
with open(ak, "a") as f:
f.write(ssh_key.strip() + "\n")
os.chmod(ak, 0o600)
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

accept_one writes the joiner-provided ssh_pub directly into ~/.ssh/authorized_keys after only a .strip(). If ssh_pub contains embedded newlines, this will append multiple lines/options and can unintentionally grant additional access beyond a single key. Validate that the value is exactly one line and matches an expected authorized_keys public-key format before appending (and consider rejecting/closing the connection on parse errors).

Copilot uses AI. Check for mistakes.
… DefaultShell, AIRC_CHANNEL) (#192)

* fix(install.sh): honor AIRC_CHANNEL on fresh install (vhsm's catch)

Caught by vhsm-d1f4 2026-04-28 during the #191 release-gate fresh-
install verification: \`AIRC_CHANNEL=canary curl|bash\` silently landed
on main, requiring a follow-up \`airc canary && airc update\` dance.

The fresh-install branch (line 495 pre-fix) was \`git clone\` without
specifying a branch, so it defaulted to the repo's default (main)
regardless of the env var. The update-existing branch already honored
\$SAVED_CHANNEL via \$CLONE_DIR/.channel; only the cold-start path was
broken.

Fix:
1. \$CHANNEL_TARGET = \${AIRC_CHANNEL:-main}, validated against the
   known list (main, canary) — unknown values fall back to main with
   a warning rather than failing later with an obscure git error.
2. \`git clone --branch \$CHANNEL_TARGET\` lands directly on the
   requested branch.
3. Write \$CLONE_DIR/.channel after clone so future \`airc update\`
   stays on the same channel (matches what \`airc canary\` / \`airc
   main\` would write).

Verified locally: AIRC_CHANNEL=canary lands on canary HEAD; default
lands on main; bogus value falls back to main with the warning.

* fix(install.sh): make Windows-from-bash work end-to-end (no PowerShell ask)

Joel 2026-04-28: "is anyone running claude or codex from inside
powershell?" — basically nobody. Real users are in Git Bash via
Claude Code / Codex on Windows, and we were forcing them to switch
shells just to install. Bad onboarding.

install.sh on MSYS already covered most of the Windows setup (winget
prereqs, OpenSSH.Server capability, HNS port-22, firewall, sshd
start). Two gaps closed here:

1. **DefaultShell registry write** (#98). The elevated PowerShell
   payload now also writes HKLM:\SOFTWARE\OpenSSH\DefaultShell to
   Git for Windows bash.exe. Without this, every Windows airc HOST
   silently fails inbound `airc msg` because OpenSSH's default shell
   is cmd.exe, which lacks `cat`, POSIX redirects, and the rest of
   the vocabulary airc remote commands assume. Bash candidates +
   PATH lookup + idempotent registry write.

2. **Tailscale via winget** (#94). install_tailscale's case statement
   now has an MSYS branch using `winget install --id Tailscale.Tailscale`
   (proper case — winget --exact is case-sensitive). Previously
   install.sh on Git Bash skipped Tailscale entirely.

Result: a Windows user pasting

    AIRC_CHANNEL=canary bash -c "$(curl -fsSL https://raw.githubusercontent.com/CambrianTech/airc/canary/install.sh)"

into their Git Bash terminal gets the FULL Windows host setup in one
shot — winget prereqs + Tailscale + sshd + DefaultShell — without
ever opening a PowerShell window. One UAC prompt for the elevated
sshd payload, that's it.

install.ps1 stays for the edge case where someone wants airc.ps1
(PowerShell-native) — that path still installs pwsh + wires
airc.cmd / airc.ps1 to %USERPROFILE%\AppData\Local\Programs\airc,
which bash install.sh deliberately does not (Git Bash users use the
bash airc via ~/.local/bin).

* docs(README): bash install.sh is canonical for everyone (incl. Windows Git Bash)

Joel 2026-04-28: \"is anyone running claude or codex from inside
powershell?\" — basically nobody. Real users on Windows are in Git Bash
via Claude Code / Codex / Cursor / opencode / Windsurf / openclaw.
Pointing them at install.ps1 and 'open PowerShell' was bad onboarding
that we have to get perfect or we get no users.

Demote install.ps1 to a side note for the rare native-PowerShell user
who specifically wants airc.ps1. Lead with bash install.sh as the
universal entry point. The companion install.sh changes (in this same
PR) make MSYS path bulletproof: winget prereqs + Tailscale + sshd
capability + HNS port-22 + firewall + DefaultShell=bash.exe, all
behind one UAC prompt.

Two install sections updated (top, and the Setup block at line 120).
Skills already used the bash form everywhere so no skill changes needed.

* fix(install.sh): post-install message stops claiming Tailscale isn't there when it just got installed

Joel 2026-04-28: 'Cross lan mesh? tailscale is optional but recommended.
Well guess fucking what it is installed sooooo. fail?'

The end-of-install banner unconditionally printed 'Tailscale is optional
but recommended: https://tailscale.com' even after winget had just
installed it 30 seconds earlier. Reads as 'install failed' to the user.

Three states now handled:
- Not installed → show the optional/URL line (was always shown)
- Installed, logged out → ts_post_check warns + shows sign-in path
- Installed, logged in → silent (best UX)

Plus extend ts_post_check + tailscale_present to find Tailscale on
Windows Git Bash (`/c/Program Files/Tailscale/tailscale.exe`) — winget
installs there, PATH may not include it in the current shell yet, so
the bare `command -v tailscale` would have returned false and the post
install would have nagged users to install something already installed.
…most important part to get right) (#193)

Joel 2026-04-28: "your powershell crashes. It has red all over but
blinks for a half second so i have no idea." Followed by: "ok sorry
this is THE MOST IMPORTANT PART TO GET RIGHT."

The elevated PowerShell window opens for UAC, runs the payload, and
auto-closes the moment the script ends. If anything errored mid-
payload, red text flashed for ~500ms then the window died. User had
no actionable signal — just a feeling that the install failed.

Three changes that together give the user a clear picture regardless
of outcome:

1. Wrap the elevated payload in Start-Transcript / Stop-Transcript,
   writing to %TEMP%\airc-install-elevated.log. The file always
   exists after Start-Process -Wait returns; bash side translates the
   Windows path to MSYS form (cygpath -u when present, sed fallback
   when not) and dumps it indented inside an "── elevated PowerShell
   output ───" / "── (end log) ───" block.

2. Step labels in the payload — "==> OpenSSH.Server capability",
   "==> HNS port-22 reservation", "==> Firewall rule", "==> sshd
   service", "==> DefaultShell registry" — so the transcript reads as
   a clear sequence of what was attempted, not a single Write-Host at
   the end.

3. Robust failure detection: the payload now `exit $LASTEXITCODE`s
   based on try/catch, so Start-Process -Wait propagates the real
   outcome. As belt-and-suspenders, bash also greps the transcript
   for "airc-elevated-error:" pattern. On failure, prints the captured
   manual-fix recipe (Add-WindowsCapability, reg, netsh, Start-Service,
   Set-Service, AND now the DefaultShell registry write — was missing
   from the manual hint pre-fix).

Plus: tailscale_present now also probes via `where.exe tailscale.exe`
so winget user-scope installs (%LOCALAPPDATA%\...) get detected. Joel
caught this 2026-04-28 — winget had installed Tailscale but the
post-install message still said "Tailscale is optional but recommended"
because none of the hard-coded paths matched and `command -v tailscale`
on Git Bash didn't honor PATHEXT. `where.exe` searches every PATH +
PATHEXT location; mirrors what airc.ps1's resolve_tailscale_bin does.
Copilot AI review requested due to automatic review settings April 28, 2026 04:02
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 15 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread install.sh Outdated
Comment on lines +444 to +451
# Strict probe: presence on PATH AND a successful --version invocation.
# The bare `command -v` form is fooled by Windows's Microsoft Store
# python3.exe alias (continuum-b69f, 2026-04-27) — the file exists,
# satisfies command -v, but exits 49 with a Store-redirect message
# when actually run. Pre-fix: install printed "All required prereqs
# present" and airc later silent-fail-cascaded at every python3 -c
# invocation. Strict probe catches this at install time.
if ! command -v "$cmd" >/dev/null 2>&1 || ! "$cmd" --version >/dev/null 2>&1; then
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ensure_prereqs uses a blanket "$cmd" --version strict-probe for all prereqs (git/gh/jq/openssl/ssh-keygen/python3). Several of these do not support --version (notably ssh-keygen on macOS/BSD; and openssl commonly uses openssl version), so this will falsely mark them missing and trigger unnecessary/incorrect installs. Suggest probing only python/python3 with --version (to defeat the Windows Store stub), and using command -v (or command-specific version flags) for the rest.

Suggested change
# Strict probe: presence on PATH AND a successful --version invocation.
# The bare `command -v` form is fooled by Windows's Microsoft Store
# python3.exe alias (continuum-b69f, 2026-04-27) — the file exists,
# satisfies command -v, but exits 49 with a Store-redirect message
# when actually run. Pre-fix: install printed "All required prereqs
# present" and airc later silent-fail-cascaded at every python3 -c
# invocation. Strict probe catches this at install time.
if ! command -v "$cmd" >/dev/null 2>&1 || ! "$cmd" --version >/dev/null 2>&1; then
# Probe python3 more strictly: the bare `command -v` form is fooled
# by Windows's Microsoft Store python3.exe alias
# (continuum-b69f, 2026-04-27) — the file exists, satisfies
# command -v, but exits 49 with a Store-redirect message when
# actually run. For the other prereqs, command-specific `--version`
# handling is not portable (e.g. ssh-keygen on macOS/BSD, openssl
# typically uses `openssl version`), so only require PATH presence.
if ! command -v "$cmd" >/dev/null 2>&1 || { [ "$cmd" = "python3" ] && ! "$cmd" --version >/dev/null 2>&1; }; then

Copilot uses AI. Check for mistakes.
Comment thread install.sh Outdated
Comment on lines +313 to +318
# Run the elevated payload. Start-Process exits 0 if it could
# launch the elevated process; the payload's own exit code is
# what we care about (it explicitly `exit $LASTEXITCODE`s based
# on try/catch).
powershell.exe -NoProfile -Command "Start-Process powershell -Verb RunAs -Wait -ArgumentList '-NoProfile -Command \"$_elevated_payload\"'" 2>&1 \
|| _elev_rc=$?
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The elevated Windows sshd setup passes a multi-line PowerShell script via -ArgumentList '-NoProfile -Command "$_elevated_payload"', but _elevated_payload contains many double quotes. As expanded, those quotes will break the nested -Command "..." argument parsing, so the elevated process is very likely to fail or run partial code. Use a safer transport (e.g., write the payload to a temp .ps1 file and Start-Process powershell.exe -File <path>, or use -EncodedCommand) so quoting/newlines are preserved.

Copilot uses AI. Check for mistakes.
Comment on lines +183 to +207
# Save joiner as peer (with stable-host stale cleanup).
peers_dir = os.path.expanduser(args.peers_dir)
os.makedirs(peers_dir, exist_ok=True)
jname = joiner["name"]
jhost = joiner.get("host", "")
if jhost and os.path.isdir(peers_dir):
for entry in os.listdir(peers_dir):
if not entry.endswith(".json") or entry == jname + ".json":
continue
try:
d = json.load(open(os.path.join(peers_dir, entry)))
except Exception:
continue
if d.get("host") == jhost:
for ext in (".json", ".pub"):
p = os.path.join(peers_dir, entry[:-5] + ext)
if os.path.isfile(p):
try:
os.remove(p)
except Exception:
pass

timestamp = datetime.datetime.utcnow().strftime("%Y-%m-%dT%H:%M:%SZ")
with open(os.path.join(peers_dir, jname + ".json"), "w") as f:
json.dump({
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

accept_one uses jname = joiner["name"] directly in filenames (os.path.join(peers_dir, jname + ".json")). Since this listener is reachable over the network, a malicious joiner could send a name containing path separators (e.g. ../...) and write outside peers_dir. Please validate/sanitize jname (e.g., enforce the same [a-z0-9-]+ constraint used elsewhere) and reject/close the connection on invalid input before doing any file writes.

Copilot uses AI. Check for mistakes.
Comment on lines +160 to +169
data = b""
while True:
chunk = conn.recv(4096)
if not chunk:
break
data += chunk
if b"\n" in data:
break

joiner = json.loads(data.decode().strip())
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

accept_one reads from the socket into data until it sees a newline, but there is no maximum size guard. A client can stream arbitrary bytes without \n, causing unbounded memory growth (DoS). Consider enforcing a max payload size (e.g., stop reading and close the connection after N bytes) and handling JSON decode errors gracefully.

Copilot uses AI. Check for mistakes.
Comment thread lib/airc_core/datetime.py
Comment on lines +20 to +22


def iso_to_epoch(ts: str) -> int | None:
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iso_to_epoch uses PEP 604 union syntax (int | None), which requires Python 3.10+. The repo doesn’t appear to document a minimum Python version, and some Linux distros still default to <3.10. Either document/enforce a 3.10+ minimum in install/docs, or switch to Optional[int] / int | None-free typing for wider compatibility.

Suggested change
def iso_to_epoch(ts: str) -> int | None:
from typing import Optional
def iso_to_epoch(ts: str) -> Optional[int]:

Copilot uses AI. Check for mistakes.
Comment thread lib/airc_core/config.py
Comment on lines +47 to +67
def cmd_set_name(args) -> int:
"""Atomically write the identity name into config.json.

Replaces the inline-Python heredoc that lived in cmd_rename. With
multi-scope rename propagation (#179), cmd_rename writes the name
into the primary scope AND every sidecar scope's config; doing it
via a single CLI call per scope keeps the write quoting-safe (the
heredoc inlined `$new_name` into a python string literal which
would have broken on names containing single quotes — fortunately
the rename sanitizer only allows [a-z0-9-] today, but the heredoc
pattern was a sharp edge).
"""
try:
c = json.load(open(args.config))
except (OSError, ValueError) as e:
print(f"airc-config-set-error: cannot read {args.config}: {e}", file=sys.stderr)
return 1
c["name"] = args.name
try:
json.dump(c, open(args.config, "w"), indent=2)
return 0
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cmd_set_name / cmd_set_host_block docstrings say they write config "atomically", but the implementation writes directly to args.config (json.dump(..., open(args.config, "w"))). If the process is interrupted mid-write, config.json can be left truncated/corrupt. Consider writing to a temp file in the same directory and os.replace() it into place, and use with open(...) to avoid leaking file descriptors.

Copilot uses AI. Check for mistakes.
…lag (false 'manual install' warning) (#194)

fix(install.sh): ssh-keygen probe drops --version (no such flag) — false 'manual install' warning

Joel 2026-04-28: install on Windows Git Bash printed 'These prereqs
need manual install on winget: ssh-keygen' even though Git for
Windows bundles a perfectly good ssh-keygen.exe at /c/Program Files/
Git/usr/bin/ssh-keygen.exe.

Root cause: the strict probe added for python3 (Microsoft Store
alias trick — alias passes command -v, exits 49 on actual call) was
applied indiscriminately to every prereq via:

    "$cmd" --version >/dev/null 2>&1

ssh-keygen has no --version flag (that's ssh's -V; ssh-keygen's -V
means 'verify a signature with a CA'). It exits non-zero on every
install. Strict probe → false-missing → 'manual install' warning →
new user thinks setup failed.

Fix: skip the strict --version variant for ssh-keygen; bare command
-v is sufficient since Git for Windows always ships a working binary.
git/gh/jq/openssl/python3 still get the strict probe (each supports
--version cleanly, and python3 specifically needs it for the Store
alias defense).
…llback (#195)

fix(install/airc): elevated transcript path uses Win temp; Tailscale Windows GUI fallback

Two Windows Git Bash issues Joel hit 2026-04-28:

1. install.sh's elevated payload wrote to %TEMP%\airc-install-elevated
   .log but bash side computed the path via $env:TEMP through a
   powershell.exe call inheriting bash's TEMP=/tmp, so we looked at
   /tmp/airc-install-elevated.log — different file. Bash printed
   "Elevated transcript not written" while the actual transcript
   sat untouched at C:\Users\green\AppData\Local\Temp\airc-install-
   elevated.log. Fix: use [System.IO.Path]::GetTempPath() which asks
   the OS directly (not env), giving the same Windows path on both
   the elevated process and the bash-side resolver.

2. airc's resolve_tailscale_bin used [ -x ] for Windows .exe paths,
   which Git Bash MSYS doesn't always recognize on NTFS files even
   when Windows considers them runnable-by-extension. Switch to
   [ -f ] for Windows path candidates and add a where.exe fallback
   so winget user-scope installs (%LOCALAPPDATA%\...) get found.
   Also extend ts_post_check in install.sh with the same logic.

3. tailscale_login_check_or_prompt: on Windows, if `tailscale up`
   from non-admin Git Bash exits silently (daemon pipe doesn't
   talk to user shell), fall back to launching tailscale-ipn.exe
   (the GUI sibling next to tailscale.exe) so the user can click
   the tray "Log in". Without this, Joel saw no popup, no URL,
   and silently proceeded with logged-out Tailscale.

4. install.sh's ssh-keygen probe still skipped --version (already
   in #194 — unchanged here).
Copilot AI review requested due to automatic review settings April 28, 2026 04:37
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 15 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread install.sh Outdated
Comment on lines +324 to +329
# Run the elevated payload. Start-Process exits 0 if it could
# launch the elevated process; the payload's own exit code is
# what we care about (it explicitly `exit $LASTEXITCODE`s based
# on try/catch).
powershell.exe -NoProfile -Command "Start-Process powershell -Verb RunAs -Wait -ArgumentList '-NoProfile -Command \"$_elevated_payload\"'" 2>&1 \
|| _elev_rc=$?
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The elevated PowerShell payload is interpolated directly into a -Command "$_elevated_payload" argument. Because _elevated_payload contains many embedded double quotes and newlines, this is very likely to break command-line parsing (and can fail unpredictably) on Windows Git Bash. Safer pattern: write the payload to a temp .ps1 file and Start-Process powershell.exe -File <temp> (or use -EncodedCommand) so quoting/newlines don’t corrupt the script content.

Copilot uses AI. Check for mistakes.
Comment thread lib/airc_core/config.py
Comment on lines +73 to +105
def cmd_set_host_block(args) -> int:
"""Atomically write the post-handshake host_* fields into config.

Replaces a fragile env-var-passed python heredoc that bit on MSYS
Git Bash (continuum-b69f's catch 2026-04-27): MSYS translates env
var values that look like Unix paths INTO the Windows-binary
subprocess, so /Users/... silently became C:/Program Files/Git/...
Argparse `--flags` are per-arg-predictable (callers can `//`-prefix
individual values or use MSYS2_ARG_CONV_EXCL targeted-ly), and
the python source is fixed bytes regardless of the values.
"""
try:
c = json.load(open(args.config))
except (OSError, ValueError) as e:
print(f"airc-config-set-error: cannot read {args.config}: {e}", file=sys.stderr)
return 1
c["host_airc_home"] = args.host_airc_home or ""
c["host_name"] = args.host_name or ""
try:
c["host_port"] = int(args.host_port)
except (TypeError, ValueError):
c["host_port"] = 7547
c["host_ssh_pub"] = args.host_ssh_pub or ""
try:
c["host_identity"] = json.loads(args.host_identity_json or "{}")
except ValueError:
c["host_identity"] = {}
try:
json.dump(c, open(args.config, "w"), indent=2)
return 0
except OSError as e:
print(f"airc-config-set-error: cannot write {args.config}: {e}", file=sys.stderr)
return 1
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cmd_set_host_block also writes config via json.dump(..., open(args.config, "w")) despite the docstring stating it’s atomic. Same risk of partial writes/corruption on interruption; recommend temp-file + os.replace() and with open(...) for both read/write handles.

Copilot uses AI. Check for mistakes.
Comment on lines +66 to +72
payload = json.dumps({
"name": args.my_name,
"host": args.my_host,
"ssh_pub": args.my_ssh_pub,
"sign_pub": args.my_sign_pub,
"airc_home": args.my_airc_home,
"identity": json.loads(args.my_identity_json or "{}"),
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cmd_send builds payload using json.loads(args.my_identity_json ...) outside the socket try/except. If --my-identity-json is malformed, this will raise and crash without a clear error, bypassing the existing error handling. Suggest validating/parsing the JSON in a try/except and returning a non-zero exit with a helpful stderr message.

Suggested change
payload = json.dumps({
"name": args.my_name,
"host": args.my_host,
"ssh_pub": args.my_ssh_pub,
"sign_pub": args.my_sign_pub,
"airc_home": args.my_airc_home,
"identity": json.loads(args.my_identity_json or "{}"),
try:
identity = json.loads(args.my_identity_json or "{}")
except (ValueError, TypeError) as e:
print(f"airc-handshake-send-error: invalid --my-identity-json: {e}", file=sys.stderr)
return 1
payload = json.dumps({
"name": args.my_name,
"host": args.my_host,
"ssh_pub": args.my_ssh_pub,
"sign_pub": args.my_sign_pub,
"airc_home": args.my_airc_home,
"identity": identity,

Copilot uses AI. Check for mistakes.
if b"\n" in data:
break

joiner = json.loads(data.decode().strip())
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cmd_accept_one does joiner = json.loads(data.decode().strip()) without any validation/exception handling. A malformed/partial handshake line will crash the listener (DoS) and may leave the connection/socket unclosed. Recommend wrapping decode/parse in try/except, closing conn, and returning cleanly (or emitting a structured error response) on bad input.

Suggested change
joiner = json.loads(data.decode().strip())
try:
payload = data.decode().strip()
joiner = json.loads(payload)
if not isinstance(joiner, dict):
raise ValueError("handshake payload must be a JSON object")
except (UnicodeDecodeError, json.JSONDecodeError, TypeError, ValueError) as e:
print(f"airc-handshake-accept-error: invalid handshake payload: {e}", file=sys.stderr)
conn.close()
sock.close()
return 1

Copilot uses AI. Check for mistakes.
Comment on lines +93 to +104
def _rename_files(peers_dir: str, old: str, new: str) -> bool:
old_json = os.path.join(peers_dir, f"{old}.json")
new_json = os.path.join(peers_dir, f"{new}.json")
if not os.path.isfile(old_json):
return False
try:
os.rename(old_json, new_json)
d = json.load(open(new_json))
d["name"] = new
json.dump(d, open(new_json, "w"), indent=2)
except Exception:
pass
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_rename_files (and _find_peer_by_host) uses json.load(open(...)) / json.dump(..., open(...)) without context managers, which can leak file descriptors in a long-running monitor (renames may happen repeatedly). Use with open(...) as f: for both reading and writing, and consider not swallowing all exceptions silently here to aid debugging when a rename fails.

Copilot uses AI. Check for mistakes.
…ey ACLs (continuum's catches) (#197)

* fix(install.sh): stage payload as .ps1 file + ssh-keygen -A for hostkey ACLs

Two Windows install bugs found via Mac↔Windows Claude debug loop on
issue #196 (continuum-b69f testing on real Windows MINGW64):

1. **Inline payload mangled by 4-layer quote escaping.** Pre-fix:
   `... -ArgumentList '-NoProfile -Command "$_elevated_payload"'`
   The payload contained many "" (PS strings) and \\ (registry paths);
   bash double-quoted → ps outer -Command → Start-Process ArgumentList
   single-quoted → inner -Command double-quoted. Each layer ate quotes
   differently. PowerShell never parsed the payload, the elevated
   window opened + ran nothing + closed silently. No transcript ever
   written. Joel saw a "OpenSSH installed + started" success message
   contradicted by a missing-transcript warning on the same run.

   Fix: stage payload as a .ps1 file in $CLONE_DIR, run via
   `Start-Process -File <path>`. Zero-quoting on the boundary; the
   .ps1 file is plain PowerShell and quotes/backslashes work natively.

2. **sshd Start-Service fails with WIN32_EXIT_CODE 1067 ("terminated
   unexpectedly") on every fresh Windows OpenSSH install** because
   host-key files exist with overly-permissive ACLs (Authenticated
   Users / BUILTIN\\Users / Everyone). sshd refuses to load them
   ("sshd: no hostkeys available -- exiting").

   Fix: add `ssh-keygen -A` to the elevated payload between the
   capability install and Start-Service. Idempotent — generates
   missing host keys AND restores correct ACLs (SYSTEM + Admins
   only) on existing ones. continuum-b69f's diagnosis.

3. **Bash side now re-queries sshd state post-elevation** as belt-
   and-suspenders. Previous behavior printed "OpenSSH installed +
   started" if the elevated payload exit was 0, even when no transcript
   was written and sshd wasn't actually running. The silent-success-
   while-broken path was the worst version of this bug. Now: bash
   calls `Get-Service sshd` from non-elevated PS; if state isn't
   "Running" it surfaces a "partial install" warning even when
   elevated exit was 0.

Verified by continuum-b69f on real Windows MINGW64: PR #195 (which
this PR builds on) now produces a complete transcript dumped to bash
terminal. Without the ssh-keygen -A addition though, sshd Start-Service
still failed in his run — that's what this PR adds.

* fix(install.sh): kill em-dash + drop global try/catch + parse-check before UAC

Three real bugs hiding behind one symptom on continuum-b69f's Windows
machine: install reported "OpenSSH installed + started" while sshd was
actually crashloop-stopped with exit 1067 ("no hostkeys available").
Joel called it "amateur try/catch" -- he was right.

1. Em-dash (U+2014) in a string literal mis-parsed under cp1252.

   PowerShell 5.1 reads BOMless .ps1 files as the system codepage
   (cp1252 on most Windows). UTF-8 em-dash is bytes E2 80 94. Byte 94
   in cp1252 is RIGHT-DOUBLE-QUOTATION-MARK. Parser sees "...$path "
   ...rest" -- treats the trailing 94 as a closing string quote and
   the rest of the file fails to parse. Nothing executes. No log
   written. Elevated window blinks closed silently.

   Fix: heredoc is now ASCII-only AND we prepend a UTF-8 BOM as
   defense-in-depth so future edits don't regress.

2. Global try/catch + $ErrorActionPreference = "Stop" hid the parse
   error completely.

   The parse error happens BEFORE Start-Transcript runs -- nothing in
   the try/catch could catch it because the parser never reaches the
   try at all. The bash side saw "no transcript written" and printed
   the misleading "UAC denied or Start-Process failed" warning.

   Fix: drop both. Each step runs plainly. PowerShell prints native
   errors to the transcript and execution continues. Bash side
   already re-queries Get-Service sshd post-elevation as the source-
   of-truth verdict, so we don't need the script's exit code to lie
   about success.

3. Parse errors didn't surface until after UAC.

   Fix: bash side now runs [Parser]::ParseFile on the staged .ps1
   from a non-elevated process before Start-Process is called. If
   any parse errors exist, we print them and abort -- no UAC prompt,
   no silent close, the user sees exactly what's wrong.

Per Joel: "we prefer parser issues to actually error" -- this is how
they actually error.

Verified locally on continuum-b69f's box: new payload parses clean
(456 tokens, no errors). Will end-to-end-test next.

* fix(install.sh): icacls-reset host key ACLs (ssh-keygen -A alone is not enough)

Previous commit's diagnosis was half-right: yes the host-key step needs
work, but ssh-keygen -A is for *generating missing keys*, not for
fixing ACLs on existing ones. Confirmed by capturing the elevated
transcript on continuum-b69f's box -- ssh-keygen -A produced no output
at all (existing keys were already there, nothing to do), and sshd
still failed Start-Service with exit 1067.

Ran sshd -ddd directly to see the underlying file-open errors:
  Failed to open file: ...ssh_host_rsa_key error:5   (ACCESS_DENIED)
  Failed to open file: ...ssh_host_rsa_key error:13  (ACL secure_permission_check failed)

So sshd-as-LocalSystem can't read the host keys *and* their ACLs flunk
sshd's own security check. Two distinct ACL problems, both fixed by
the same pattern: take ownership, wipe inheritance, grant SYSTEM +
BUILTIN\Administrators full control, no other ACEs.

Tools considered:
- FixHostFilePermissions.ps1: removed from Windows-OpenSSH years ago
- OpenSSHUtils PS module: official, but PSGallery dep + module trust
  prompt = friction we don't want for an install script
- icacls: in-box on every Windows + bulletproof. Picked this.

The new step:
  takeown /F <key>             # become owner
  icacls <key> /reset          # wipe inherited ACEs
  icacls <key> /inheritance:r /grant SYSTEM:F /grant Administrators:F

Output is captured per-key in the transcript so any failure is visible.
ssh-keygen -A still runs first (cheap, idempotent) so any *missing*
keys get auto-generated before the ACL fix runs.

* fix(install.sh): delete + regen host keys (icacls /grant alone insufficient for sshd)

icacls /grant SYSTEM:F /grant Administrators:F succeeded per the
transcript on continuum-b69f's box, but sshd-as-LocalSystem still
refused to load the keys with errors 5+13 (ACCESS_DENIED + ACL fails
secure_permission_check). The post-fix ACLs are technically correct
(SYSTEM + Admins only, no inheritance), but OpenSSH's permission check
is fragile w.r.t. owner identity and explicit-vs-inherited handling.

Cleaner: delete any existing host_key files and re-run ssh-keygen -A.
Since ssh-keygen -A here runs from an elevated SYSTEM-context
PowerShell, it sets the right owner (SYSTEM) and ACEs at creation
time -- which sshd accepts. This sidesteps every "what does icacls
think SYSTEM:(F) means" question entirely.

Safe at install time: the host hasn't published any fingerprint to
peers yet, so regenerating doesn't break anything. Subsequent installs
where sshd is already Running (state == Running) skip this whole
ensure_sshd_running block via the case statement.

Also added a post-regen `icacls <rsa-key>` dump to the transcript so
we can see at a glance what the resulting ACL looks like -- saves a
UAC round-trip the next time something looks off.

* fix(install.sh): strip creator ACE that ssh-keygen -A leaves on host keys

Found via post-regen ACL dump on continuum-b69f 2026-04-28:

  C:\ProgramData\ssh\ssh_host_rsa_key BUILTIN\Administrators:(F)
                                      NT AUTHORITY\SYSTEM:(F)
                                      BIGMAMA\green:(M)    <-- the bug

ssh-keygen -A on Windows leaves an ACE for whichever user ran it (the
creator), even when running elevated. OpenSSH's secure_permission_check
rejects any non-(owner|SYSTEM|Administrators) ACE -- so the freshly
regenerated keys still failed sshd's check, even though they had no
inheritance and SYSTEM + Admins had Full Control.

Fix: after ssh-keygen -A, run icacls /remove:g $(whoami) on each
host_*_key to strip the creator's ACE. Combined with /inheritance:r
+ /grant SYSTEM:F + Admins:F, the resulting ACL is exactly what sshd
wants: just SYSTEM and Administrators, no inheritance, no extras.

The post-fix ACL is dumped to the transcript so we can verify it
visually -- and so future "wait sshd still won't start" diagnoses
have a paper trail of what the ACL looked like.

* fix(install.sh): also chown host keys to SYSTEM (icacls /setowner)

Found via Get-Acl owner check on continuum-b69f 2026-04-28: even after
removing creator's ACE, ssh-keygen -A leaves the file OWNER as
BIGMAMA\green (the elevated user). OpenSSH's secure_permission_check
also looks at owner -- if the owner isn't in {SYSTEM, Administrators,
running sshd user}, the check fails with error 13 even though access
control entries are correct.

Adding icacls /setowner 'NT AUTHORITY\SYSTEM' before the inheritance
and grant calls so SYSTEM owns the key. Owner = SYSTEM, ACEs = SYSTEM
+ Admins, no creator, no inheritance -- the canonical OpenSSH-on-
Windows host key permission state.

* chore(install.sh): surface sshd dry-run + owner in transcript

Adds a 'sshd -t' dry-run step from the elevated context and dumps the
post-fix file owner alongside the ACL. Goal: when Start-Service sshd
fails, the transcript shows exactly what sshd itself complains about
('no hostkeys available' vs 'bad ownership' vs config syntax) without
needing another UAC round-trip to query.

* fix(install.sh): reset C:\ProgramData\ssh + logs/ folder ACLs (the actual MS-documented cause)

WebSearch turned up the exact MS Learn KB for our symptom (sshd -t passes
from elevated, Start-Service fails 1067, no event log entry):

  https://learn.microsoft.com/en-us/troubleshoot/windows-server/system-management-components/error-1053-1067-7034-after-update-openssh-doesnt-start

  "This issue occurs if the C:\ProgramData\ssh and C:\ProgramData\ssh\logs
   folders have incorrect permissions. The permissions might be too
   limited or too open. For example, the SYSTEM account or the
   Administrators group might not have write permissions. For a second
   example, regular users might have write or full control permissions."

Required ACL on each folder:
  SYSTEM              : Full Control
  Administrators      : Full Control
  Authenticated Users : Read & execute  (no write)
Owner: SYSTEM.

Up to this commit we'd been fixing the host_*_key file ACLs only, never
the parent folder. The Microsoft fix is on the FOLDER. Adds a new
elevated-payload step that sets owner + inheritance + ACEs on both
C:\ProgramData\ssh and C:\ProgramData\ssh\logs with (OI)(CI) inheritance
flags so newly-created files inherit correctly.

The Oct-2024 update introduced this strictness; the March-2025 update
loosened it back into a warning ("Event ID 4: write access is granted
to the following users: ..."), so machines fully patched past March
2025 may not need this. But continuum-b69f's box (Windows 11 24H2,
build 26100.8115, otherwise fully patched) is still hitting the
strict-mode failure -- so applying the documented fix is still required.

* fix(install.sh): restart HNS service after port-22 reservation (the actual blocker)

OpenSSH/Admin event log on continuum-b69f revealed the real blocker:

  sshd: error: Bind to port 22 on 0.0.0.0 failed: Permission denied.
  sshd: error: Bind to port 22 on :: failed: Permission denied.
  sshd: fatal: Cannot bind any address.

Even with the HNS reg key (EnableExcludedPortRange=0) set AND netsh
showing port 22 in the excluded range ('22  22  *' administered),
sshd-as-LocalSystem still got EACCES on bind. HNS service was holding
port 22 at a layer below netsh visibility -- the reg key + netsh
reservation only take effect after a Restart-Service hns (or reboot).

Adds an HNS restart immediately after the port-22 reservation step.
Now sshd can actually bind port 22 when Start-Service runs the next
step. This was already documented in continuum-b69f's memory file
(reference_airc_windows.md) but the install.sh implementation never
actually restarted the service.

The host-key permission saga from the prior 7 commits in this branch
turned out to be a sidequest -- those issues were real but not the
blocker. sshd -t (which doesn't bind a socket) was passing the whole
time. The real failure was at bind time, not at config-load time.
…ing (#199)

fix(install.sh): auto-run 'gh auth setup-git' so gist ops don't prompt

Joel hit this on 2026-04-28 -- Windows install with gh authenticated
in keyring (gh auth status: Logged in to github.com), but every git
operation against gist.github.com triggered a GUI password popup.
Repeating, every airc op that touched a gist fired a fresh prompt.

Cause: gh auth login stores its token in keyring/credman, but does
NOT automatically register itself as git's credential helper. So git
itself doesn't know how to use gh's token -- it falls back to
asking the user for a password on every HTTPS push/fetch.

The official one-liner is `gh auth setup-git`, which registers
`gh auth git-credential` as the credential helper for github.com URLs
in ~/.gitconfig. After this, git sees an HTTPS github.com URL,
delegates auth to gh, gh hands back the token from its store, no
prompt. Microsoft-supported, idempotent, ships with gh CLI itself.

This goes in ensure_prereqs right after the gh-auth-status check, so
fresh installs get it automatically. Skipped if already configured
(idempotency check via `git config --get-all credential.https://github.com.helper | grep gh`).
Copilot AI review requested due to automatic review settings April 28, 2026 05:53
…in) (#200)

Joel 2026-04-28 ~01:00Z: "fix the monitor man / i cant go to bed till
this is fixed". Windows had no daemon path -- `airc daemon install`
died on $(uname -s) with "not supported on MINGW64_NT-...". Result:
the only way to keep airc alive on Windows was to leave a Git Bash
window open running `airc join`. nohup+disown didn't survive parent
shell exit on MINGW64.

Adds a Windows branch to cmd_daemon_install / uninstall / status
mirroring the launchd (mac) and systemd (linux) patterns.

## Mechanism: HKCU Run-key, not Task Scheduler

First attempt was schtasks //SC ONLOGON, but Windows requires admin
to create per-user logon-triggered scheduled tasks (Access Denied for
non-elevated users, even with //RL LIMITED). Per Joel: "i just want
whatever is least hassle and also robust" -- forcing a UAC prompt at
'airc daemon install' time is exactly the kind of friction we kill.

HKCU\Software\Microsoft\Windows\CurrentVersion\Run is the per-user
autostart hive. Writing to it with `reg add` requires no admin (HKCU
is user-scope), fires at every interactive logon for the user, and
matches launchd-Agent / systemd-user semantics exactly.

## Implementation

1. `_daemon_os` returns "windows" on MINGW*/MSYS*/CYGWIN*.
2. `_daemon_install_schtasks` (kept the function name for grep
   continuity even though it's now reg-based) writes a launcher .bat
   to $scope/airc-daemon.bat that:
     - sets AIRC_HOME + AIRC_BACKGROUND_OK
     - exec's `bash -lc 'airc connect'`
     - on exit, logs to daemon.err and `goto loop` after 5s
     (matches launchd KeepAlive / systemd Restart=always)
3. `reg add` registers `cmd /c start "" /MIN "<launcher.bat>"` under
   HKCU Run, key name `airc-monitor`.
4. Fires-and-forgets `cmd /c start /MIN <launcher>` immediately so
   user doesn't need to logout/login to start the monitor.
5. uninstall: reg delete + kill + rm launcher .bat.
6. status: reg query for the entry + ps for the running airc-connect
   (matches PPID=1 orphan or falls back to airc.pid lookup).

## Verified locally on continuum-b69f

  $ airc daemon install
    ✓ Registered HKCU Run entry 'airc-monitor' (runs at every Windows logon)
    ✓ Started monitor in detached cmd window (minimized)
  $ airc daemon status
    Status:  RUNNING (PID 341089)
  $ airc daemon uninstall
    ✓ Removed HKCU Run entry 'airc-monitor'
    ✓ Killed running daemon launcher process(es)
    ✓ Removed /c/Users/green/.airc/airc-daemon.bat
  $ airc daemon install   # idempotent reinstall
    ✓ Registered ... ✓ Started monitor ...

Detached process survives the launching bash exit (which `nohup &
disown` could not on MINGW64).

## Note on AIRC_BACKGROUND_OK

The launcher sets this env var because `airc connect` may otherwise
refuse to run when not on a TTY. Same hint as the launchd plist's
EnvironmentVariables block.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 15 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread install.sh
$reg = (Get-ItemProperty -Path "HKLM:\SYSTEM\CurrentControlSet\Services\hns\State" -Name "EnableExcludedPortRange" -ErrorAction SilentlyContinue).EnableExcludedPortRange;
$regChanged = $false
if ($reg -ne 0) {
reg add "HKLM\SYSTEM\CurrentControlSet\Services\hns\State" /v "EnableExcludedPortRange" /d 0 /f | Out-Null;
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the staged elevated PowerShell payload, the reg add ... EnableExcludedPortRange line omits /t REG_DWORD. reg add defaults to REG_SZ, which can set the wrong registry type and make the HNS workaround ineffective. Update the payload (and the printed manual-fix snippet) to specify the value type explicitly.

Suggested change
reg add "HKLM\SYSTEM\CurrentControlSet\Services\hns\State" /v "EnableExcludedPortRange" /d 0 /f | Out-Null;
reg add "HKLM\SYSTEM\CurrentControlSet\Services\hns\State" /v "EnableExcludedPortRange" /t REG_DWORD /d 0 /f | Out-Null;

Copilot uses AI. Check for mistakes.
printf " [MISSING] sshd -- needed when you HOST a room\n"
printf " Fix (admin PowerShell — five lines, run all together):\n"
printf " Add-WindowsCapability -Online -Name OpenSSH.Server~~~~0.0.1.0\n"
printf " reg add HKLM\\\\SYSTEM\\\\CurrentControlSet\\\\Services\\\\hns\\\\State /v EnableExcludedPortRange /d 0 /f\n"
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Windows sshd manual fix hint uses reg add ... EnableExcludedPortRange /d 0 without specifying /t REG_DWORD. Since reg add defaults to REG_SZ, this guidance may lead to the wrong registry value type. Consider updating the printed command to include /t REG_DWORD so the workaround reliably applies.

Suggested change
printf " reg add HKLM\\\\SYSTEM\\\\CurrentControlSet\\\\Services\\\\hns\\\\State /v EnableExcludedPortRange /d 0 /f\n"
printf " reg add HKLM\\\\SYSTEM\\\\CurrentControlSet\\\\Services\\\\hns\\\\State /v EnableExcludedPortRange /t REG_DWORD /d 0 /f\n"

Copilot uses AI. Check for mistakes.
Comment thread lib/airc_core/config.py
Comment on lines +47 to +52
def cmd_set_name(args) -> int:
"""Atomically write the identity name into config.json.

Replaces the inline-Python heredoc that lived in cmd_rename. With
multi-scope rename propagation (#179), cmd_rename writes the name
into the primary scope AND every sidecar scope's config; doing it
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring says this write is "Atomically" performed, but the implementation writes directly to args.config via json.dump(..., open(..., "w")), which can truncate/corrupt the file if interrupted mid-write. Either implement an atomic write (write to a temp file + os.replace) or adjust the docstring to avoid claiming atomicity.

Copilot uses AI. Check for mistakes.
Comment thread lib/airc_core/config.py
Comment on lines +73 to +78
def cmd_set_host_block(args) -> int:
"""Atomically write the post-handshake host_* fields into config.

Replaces a fragile env-var-passed python heredoc that bit on MSYS
Git Bash (continuum-b69f's catch 2026-04-27): MSYS translates env
var values that look like Unix paths INTO the Windows-binary
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This docstring claims the host_* block write is "Atomically" performed, but the function dumps JSON directly to the target path. To match the contract, consider writing to a temporary file and os.replace() it into place; otherwise, update the wording to reflect that it's a normal write.

Copilot uses AI. Check for mistakes.
Comment thread test/integration.sh
bash -c "source '$_adapters_extract'; $*"
AIRC_PYTHON="${AIRC_PYTHON:-python3}" \
PYTHONPATH="${_airc_lib_dir}${PYTHONPATH:+:$PYTHONPATH}" \
bash -c "source '$_adapters_file'; export AIRC_PYTHON='${AIRC_PYTHON:-python3}'; $*"
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In _adapter_call, the inner bash -c sets AIRC_PYTHON using single quotes (export AIRC_PYTHON='${AIRC_PYTHON:-python3}'), which assigns the literal string ${AIRC_PYTHON:-python3} rather than expanding it. This will break adapters that rely on $AIRC_PYTHON (e.g. iso_to_epoch) on platforms where AIRC_PYTHON is not exactly python3. Prefer relying on the env var passed to bash -c, or export it without single quotes.

Suggested change
bash -c "source '$_adapters_file'; export AIRC_PYTHON='${AIRC_PYTHON:-python3}'; $*"
bash -c "source '$_adapters_file'; $*"

Copilot uses AI. Check for mistakes.
fix(airc daemon): scope tracks cwd at install time, not always $HOME/.airc

PR #200 follow-up. _daemon_scope was returning ${AIRC_HOME:-$HOME/.airc}
unconditionally, but actual user state lives in $cwd/.airc per
detect_scope(). So 'airc daemon install' from ~/continuum/ captured
the wrong scope (~/.airc, empty), spawned a monitor that connected to
nothing, user appeared offline despite 'RUNNING (PID xxx)' in status.

Mirror detect_scope's logic exactly: AIRC_HOME if set, else cwd/.airc.
Now 'airc daemon install' from a project dir captures THAT dir's
.airc as the daemon's scope, launcher .bat sets AIRC_HOME=that, the
spawned airc connect uses the right room state.

Joel 2026-04-28 ~01:05Z caught this: 'lol obv if it worked you would
have a monitor and be online. FAIL'.
…op) (#202)

* fix(airc daemon): scope tracks cwd at install time, not always $HOME/.airc

PR #200 follow-up. _daemon_scope was returning ${AIRC_HOME:-$HOME/.airc}
unconditionally, but actual user state lives in $cwd/.airc per
detect_scope(). So 'airc daemon install' from ~/continuum/ captured
the wrong scope (~/.airc, empty), spawned a monitor that connected to
nothing, user appeared offline despite 'RUNNING (PID xxx)' in status.

Mirror detect_scope's logic exactly: AIRC_HOME if set, else cwd/.airc.
Now 'airc daemon install' from a project dir captures THAT dir's
.airc as the daemon's scope, launcher .bat sets AIRC_HOME=that, the
spawned airc connect uses the right room state.

Joel 2026-04-28 ~01:05Z caught this: 'lol obv if it worked you would
have a monitor and be online. FAIL'.

* fix(airc daemon): launcher cd's to cwd, skip AIRC_HOME (Windows fs view fix)

Daemon installed via PR #200/#201 was still crashlooping (every 4s)
because the launcher .bat set AIRC_HOME to a Windows-form path
(C:\Users\green\continuum\.airc) which Git Bash's airc binary
couldn't traverse cleanly downstream. Plus 'bash -lc' was reading
login profile and re-exporting PATH which churned env.

Restructured launcher .bat:
1. 'cd /d <cwd_win>' from cmd.exe so the bash subprocess inherits
   the project dir as pwd. detect_scope() then returns <cwd>/.airc
   the same way it does in the user's interactive shell.
2. Drop AIRC_HOME entirely — let detect_scope work normally.
3. 'bash -c' not 'bash -lc' — non-login skips profile, keeps the
   env we set in cmd uncorrupted.
4. Absolute Unix-form path to airc (cygpath -u) — bash -c doesn't
   read ~/.bashrc, so PATH may not include ~/.local/bin.
5. Errors log to daemon.err relative to cwd (already cd'd into it).

Joel 2026-04-28 caught both the wrong-scope (PR #201) and now the
crashloop. Verified locally: with this launcher shape, airc connect
runs to completion + maintains the SSH tail to the host.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: install.ps1 — Tailscale winget package ID typo (lowercase) causes silent install failure

2 participants