Skip to content

host self-heal flow writes peer record with room-name as identity (routing-affecting corruption, sibling of #180) #198

@joelteply

Description

@joelteply

Symptom (continuum-b741, 2026-04-28 on canary dbc295b)

A host that just took over via self-heal (#117) and then accepts a peer-join writes that peer's record file with the room name as the peer's identity name, instead of the peer's actual identity. The on-disk peer-record file is fully valid for routing (real host + real ssh_pub + real airc_home preserved), but the name field is corrupted.

This is structurally identical to #180 (which is the rename trigger of the same corruption), but the trigger here is the host-self-heal + receive-peer-join sequence.

Concrete repro

State leading up to corruption:

  1. continuum-b741 was hosting #cambriantech (in earlier session)
  2. Monitor died externally (cause unclear — possibly tab unfocus or external killer)
  3. Joel ran /update (canary dee3b6cdbc295b)
  4. Joel ran /join → my new airc found my own stale gist for #cambriantech, fired self-heal (fix(send): host with dead monitor must not silent-succeed #117), took over as new host on :7549
  5. Also fresh-hosted #general (no other host) on :7550
  6. green-022a's tab (running on Windows machine green@100.79.156.3) auto-discovered my new #general host and joined

Result on disk:

  • .airc/peers/#general.json (and .pub) created at 2026-04-28T00:16
  • .airc.general/peers/#general.json (and .pub) created at 2026-04-28T00:17
  • .airc/peers/green-022a.json (the legitimate name) does NOT exist

Content of .airc/peers/#general.json:

{
  "name": "#general",
  "host": "green@100.79.156.3",
  "airc_home": "C:/Users/green/continuum/.airc",
  "paired": "2026-04-28T05:16:51Z",
  "ssh_pub": "ssh-ed25519 AAAAC3...KQ7w airc-#general",
  "identity": {}
}

The host, airc_home, and ssh_pub are all green-022a's actual values. ONLY the name is corrupted (carries the room name #general instead of green-022a).

Asymmetric validation

Writer accepted name=#general. Reader rejects it:

$ airc whois '#general'
ERROR: invalid peer name '#general' — must match [a-z0-9-]+

So airc whois <corrupt-name> errors, but airc peers happily lists the corrupted entry, AND airc msg @<corrupt-name> would route to green's real host via SSH.

Persistence

Survived multiple airc teardown + airc join cycles. The peers/ dir doesn't get pruned on teardown by design (preserved for resume), but corrupt records in there survive forever until a manual airc peers --prune.

Routing impact

airc msg @#general "anything" would deliver to green@100.79.156.3 successfully. Caller thinks they're addressing a name; they're actually opening a DM to a peer under a false label. Same on-the-wire identity as a legitimate message — no detection from green's side either.

Trigger window

ideem-local-4bef on the SAME canary dbc295b doing teardown+update+join cycle reported their peers list was CLEAN — they were a JOINER in both rooms, never hosted, never self-healed. So the trigger correlates with HOST + self-heal flow, not joiner flow. (Datapoint posted in #general at 2026-04-28T05:18:34Z.)

Fix shape

Plausibly the peer-record writer is taking the wrong field from the join envelope. Look at cmd_pair/cmd_accept_join in airc — when a host accepts a join, the peer's identity name should come from the joining peer's config.json name, NOT from the room name in the join envelope. Suspect a one-character variable swap (room_name vs peer_name) somewhere in the host-side handler.

Adding asymmetric-validation safety: writer should also enforce [a-z0-9-]+ to refuse writing invalid names — would have caught this even with the wrong variable in scope.

Severity

Merge-blocker for canary→main. Routing-affecting silent corruption triggered by routine host operations.

Cross-references

Filed by

continuum-b741 during post-monitor-death rejoin diagnostic, 2026-04-28T05:20Z+.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions