From 5d48b5bfbdc895ee27790d7453936bf52764faf0 Mon Sep 17 00:00:00 2001 From: Keith Avery Date: Sat, 25 Apr 2026 04:07:01 -0400 Subject: [PATCH 1/8] docs(spec): shared Room.snapshot for ADR-037 (Python port) Design for the proper architectural fix to per-session GameSnapshot divergence in multiplayer. Replaces the band-aid _merge_peer_state_into_snapshot helper with a single canonical snapshot held on SessionRoom and shared by every WS session bound to the slug. Constraint that simplifies scope: no saved MP games exist on disk (multiplayer has never worked end-to-end). No migration path needed. Band-aid + its 5 merge tests are deleted in the same change. Out of scope: ADR-028 LLM rewrites, per-recipient narration region filtering, PlayerState overlay struct. Co-Authored-By: Claude Opus 4.7 (1M context) --- .../2026-04-25-shared-room-snapshot-design.md | 286 ++++++++++++++++++ 1 file changed, 286 insertions(+) create mode 100644 docs/superpowers/specs/2026-04-25-shared-room-snapshot-design.md diff --git a/docs/superpowers/specs/2026-04-25-shared-room-snapshot-design.md b/docs/superpowers/specs/2026-04-25-shared-room-snapshot-design.md new file mode 100644 index 0000000..6cadc33 --- /dev/null +++ b/docs/superpowers/specs/2026-04-25-shared-room-snapshot-design.md @@ -0,0 +1,286 @@ +--- +id: shared-room-snapshot +title: "Shared `Room.snapshot` for ADR-037 (Python port, properly)" +status: draft +date: 2026-04-25 +deciders: [Keith Avery] +related: [ADR-037, ADR-036, ADR-082] +implementation-status: draft +--- + +# Shared `Room.snapshot` for ADR-037 + +## Context + +Multiplayer in the Python port (post ADR-082) currently lets each WebSocket +session hold its own copy of `GameSnapshot`. Mutations from one session do +not propagate to peers' in-memory snapshots; persistence is per-session and +last-writer-wins. Earlier in this playtest cycle that produced two +already-fixed bugs (multi-PC characters vanishing on disconnect, Tab 2 +seeing the wrong PC as "self") and a defensive band-aid: +`WebSocketSessionHandler._merge_peer_state_into_snapshot`, which loads the +persisted snapshot before each save and pulls in peer-only entries. + +The band-aid is holding for `characters` and `player_seats`, but per-session +divergence still exists for every other shared-world field +(`lore_established`, `npc_registry`, `world_history`, region/room state, +discovery state, RAG store, scenario state). Those fields haven't surfaced +as user-visible blockers yet, but they are real bugs waiting to land. + +ADR-037 specifies the right model: per `(genre, world)` slug, a single +`SharedGameSession` mutated by all sessions, with per-player concerns +resolved on read. The Rust implementation (now retired) used +`Arc>` plus a `PlayerState` overlay map. The +Python port has not yet implemented this. + +**Constraint that simplifies scope:** no multiplayer saved games exist on +disk. Multiplayer has never worked end-to-end before this playtest cycle. +That removes the migration concern entirely. + +## Decision + +Replace per-session `sd.snapshot` ownership with a single `GameSnapshot` +held on `SessionRoom` and shared by every WebSocket session bound to that +slug. The `_SessionData.snapshot` attribute keeps its name; after slug- +connect binding it is a Python reference to the room's snapshot, so all +existing reads and mutations transparently hit the canonical object. + +The `_merge_peer_state_into_snapshot` band-aid is deleted in the same +change — it has no purpose once divergence is gone, and there are no MP +saves on disk that would benefit from its merge logic. + +## Detailed Design + +### `SessionRoom` additions + +`sidequest-server/sidequest/server/session_room.py` + +Two new fields, both protected by the existing `_lock`: + +```python +_snapshot: GameSnapshot | None = None +_store: SqliteStore | None = None +``` + +Three new methods: + +```python +def bind_world( + self, + *, + snapshot: GameSnapshot, + store: SqliteStore, +) -> None: + """Bind the canonical snapshot for this room. + + Idempotent: a second call when ``_snapshot`` is already populated is a + no-op. The first slug-connect on a room loads the persisted snapshot + (or constructs a fresh one) and calls ``bind_world``; later connects + consume the existing binding via ``snapshot`` / ``store`` properties. + """ + +@property +def snapshot(self) -> GameSnapshot | None: ... + +@property +def store(self) -> SqliteStore | None: ... + +def save(self) -> None: + """Save the canonical snapshot through the canonical store. + + Acquires ``_lock`` for the duration. Replaces every per-session + ``sd.store.save(sd.snapshot)`` call site. No-op when the room has + no snapshot bound (legacy / pre-bind paths must not crash on save). + """ + +def close_store(self) -> None: + """Close the canonical store. Idempotent. Called by RoomRegistry on + last-disconnect teardown so the SQLite handle isn't leaked.""" +``` + +Two reasons we keep the room lock at the file level rather than introducing +a separate `_snapshot_lock`: + +1. ADR-036 sealed-letter pacing + the new TURN_STATUS gate already + serialize narration turns per slug at the application layer. Mutation + contention on the snapshot is structurally near-zero. +2. The existing `_lock` already protects connect/disconnect/seat. Reusing + it keeps the locking model singular. + +### `WebSocketSessionHandler` changes + +`sidequest-server/sidequest/server/session_handler.py` + +**Slug-connect bind path** (inside the existing slug-connect branch, +after `_room_registry.get_or_create(slug, mode=...)` returns the room): + +```python +if room.snapshot is None: + # First connect — this handler loads from store and binds. + saved = store.load() + if saved is not None: + snapshot = saved.snapshot + else: + snapshot = GameSnapshot( + genre_slug=row.genre_slug, + world_slug=row.world_slug, + location="Unknown", + ) + store.init_session(row.genre_slug, row.world_slug) + room.bind_world(snapshot=snapshot, store=store) + +# Every connect: bind sd to the canonical room state. +sd.snapshot = room.snapshot +sd.store = room.store +``` + +**Save sites** — every existing `sd.store.save(sd.snapshot)` (or +`sd.store.save(snapshot)` where `snapshot is sd.snapshot`) becomes a call +to `self._room.save()`. Sites: + +- Disconnect / cleanup save (~line 993–1001). +- Chargen-commit persist (~line 2514). +- Turn-end persist in `_execute_narration_turn` (~line 2862). +- UUID-rename-on-resume save (~line 1323). + +The chargen-commit second-commit path that does +`sd.snapshot = existing_saved.snapshot` is removed — `sd.snapshot` is +already the room's canonical snapshot, so the second committer just +appends their character to it. + +**Removed:** `_merge_peer_state_into_snapshot` and its three call sites +inside `_handle_player_action` cleanup, `_execute_narration_turn`, and +disconnect-save. Also removed: the imports / helper guards introduced +for it. + +### `_SessionData` changes + +`_SessionData.snapshot` and `_SessionData.store` remain typed as +`GameSnapshot` and `SqliteStore`. After slug-connect bind, they are +references to the room-level objects. This keeps the dispatch pipeline +verbatim — every existing line that reads `sd.snapshot.characters` or +mutates `sd.snapshot.npc_registry` still works without modification. + +For paths that don't go through slug-connect (legacy non-slug connect, +some unit tests that construct `_SessionData` directly), behavior is +unchanged: each session has its own snapshot and store. There is no MP +on those paths. + +### Solo path + +Solo always runs as a one-occupant room. The shared-snapshot model is +mathematically identical to the per-session model when there is exactly +one session. No solo behavior changes. + +### Concurrency + +- Turns are serialized per slug by ADR-036 sealed-letter pacing + the + pause-gate at line 2712–2729 + the new TURN_STATUS broadcast in + `_handle_player_action`. Two sockets cannot run narration concurrently + on the same slug. +- Mutations during a turn are not lock-protected at the snapshot field + level — they don't need to be (single-writer-at-a-time, GIL covers + individual list/dict ops). +- The room `_lock` only wraps `bind_world`, `save`, and `close_store` — + the operations where two sockets can race (first-connect, two-disconnect-saves-overlapping). + +### Compatibility + +- **No saved MP games exist** (per Keith). This is the constraint that + lets us delete the band-aid wholesale. +- **Solo saves** unaffected: solo's `player_seats` is empty or + single-player; nothing diverges in the first place. +- **Legacy non-slug connect** unaffected: never enters the bind path, + keeps per-session `_SessionData`. +- **Tests that mock `store` with `MagicMock`** — affected. The merge tests + in `test_multiplayer_party_status.py` are deleted (5 tests). The + resolver tests stay green (they don't touch save/store paths). New + shared-room tests are added. + +## Tests + +Delete from `tests/server/test_multiplayer_party_status.py`: + +- `test_merge_peer_state_pulls_peer_chars_and_seats_from_persisted` +- `test_merge_peer_state_local_wins_for_shared_character_names` +- `test_merge_peer_state_noop_for_solo` +- `test_merge_peer_state_noop_when_persisted_missing` +- `test_merge_peer_state_swallows_load_errors` +- `_saved_snapshot_with` helper (no other consumers) + +Add to `tests/server/test_multiplayer_party_status.py`: + +- `test_two_handlers_share_room_snapshot_after_bind` — handlerA and + handlerB both bind to the same room; mutating + `handlerA.session_data.snapshot.characters.append(...)` is observable + via `handlerB.session_data.snapshot.characters`. +- `test_chargen_commit_visible_to_peer_handler_immediately` — handlerA + has Laverne; handlerB does chargen-commit with Shirley; handlerA's + `sd.snapshot.characters` and `sd.snapshot.player_seats` reflect both + PCs without re-loading from store. +- `test_room_save_routes_through_canonical_store` — `room.save()` + persists the canonical snapshot once; a fresh `room.snapshot` load on + a new room produces the same data. +- `test_solo_path_unaffected_by_shared_room_model` — single-occupant + room round-trips identically to the previous per-session behavior + (regression guard). + +Add to `tests/server/test_session_room.py` (new file or extend if it +exists): + +- `test_bind_world_is_idempotent` — second `bind_world` call when + snapshot is already bound is a no-op (same identity, no overwrite). +- `test_close_store_is_idempotent` — second close doesn't raise. +- `test_snapshot_property_returns_none_before_bind` — explicit + guard so accessors return None before bind, never raise. + +Existing wiring tests for `_resolve_self_character`, the chargen gate, +and TURN_STATUS broadcasts continue to pass unchanged. + +## Files Touched + +- `sidequest-server/sidequest/server/session_room.py` — fields + + methods. +- `sidequest-server/sidequest/server/session_handler.py` — slug-connect + bind, save site replacements, delete `_merge_peer_state_into_snapshot`. +- `sidequest-server/tests/server/test_multiplayer_party_status.py` — + delete 5 merge tests, add 4 shared-room tests. +- `sidequest-server/tests/server/test_session_room.py` — extend or + create with 3 binding tests. + +No UI changes. No content changes. No daemon changes. Single subrepo, +single commit (or two if test deletion + new tests are split for +review clarity). + +## Out of Scope + +- **ADR-028 LLM rewrites** (charmed/blinded/deafened narration variants). + Separate story. +- **Per-recipient narration filtering by region** (the deferred ping-pong + entry that says "Each player's narrative pane shows the other player's + narration too"). Separate story; the projection filter infrastructure + already exists, the missing piece is wiring `_visibility.visible_to` + on NarrationPayload at emit time. +- **`PlayerState` overlay struct** matching the retired Rust ADR-037 + shape. The Python `GameSnapshot` already mixes shared + per-player + fields; per-player resolution happens by `player_id` on read via the + existing `_resolve_self_character` helper. Adding a separate + `PlayerState` is a refactor that doesn't earn its keep at current + scale. + +## Acceptance Criteria + +1. Two `WebSocketSessionHandler` instances bound to the same room share + a single `GameSnapshot` reference; mutations on one are observable on + the other without any reload. +2. Every save site in `session_handler.py` routes through `room.save()`. +3. `_merge_peer_state_into_snapshot` and its 5 tests are deleted. +4. Solo behavior is unchanged. Existing solo / chargen / persistence + tests stay green. +5. `tests/server/` + `tests/agents/` full sweep is green (current + baseline: 736 passed, 2 skipped). Shared-room tests add ~7 net + (delete 5, add 4 in party_status + 3 in session_room). +6. The two-tab playtest repro that produced "Multi-PC state does not + survive disconnect" cannot reproduce: a fresh slug + Tab 1 commits + Laverne + Tab 2 commits Shirley + close both tabs + reopen → both + tabs resume their seated PC; neither re-enters chargen. From 74f21f41687b3335ab1ae8e21a6e430d4350dcf6 Mon Sep 17 00:00:00 2001 From: Keith Avery Date: Sat, 25 Apr 2026 04:11:56 -0400 Subject: [PATCH 2/8] docs(plan): shared Room.snapshot implementation plan (ADR-037 Python port) Step-by-step plan implementing the spec at docs/superpowers/specs/2026-04-25-shared-room-snapshot-design.md. 9 tasks, 8 commits, ends at 739 passed / 2 skipped on the server + agents sweep. Sequential single-developer plan suitable for inline execution as the Bicycle Repair Man dev agent. Co-Authored-By: Claude Opus 4.7 (1M context) --- .../plans/2026-04-25-shared-room-snapshot.md | 1220 +++++++++++++++++ 1 file changed, 1220 insertions(+) create mode 100644 docs/superpowers/plans/2026-04-25-shared-room-snapshot.md diff --git a/docs/superpowers/plans/2026-04-25-shared-room-snapshot.md b/docs/superpowers/plans/2026-04-25-shared-room-snapshot.md new file mode 100644 index 0000000..f7fa485 --- /dev/null +++ b/docs/superpowers/plans/2026-04-25-shared-room-snapshot.md @@ -0,0 +1,1220 @@ +# Shared Room.snapshot Implementation Plan + +> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. + +**Goal:** Replace per-session `sd.snapshot` ownership with a single `GameSnapshot` held on `SessionRoom` and shared by every WebSocket session bound to that slug. Drop the `_merge_peer_state_into_snapshot` band-aid wholesale (no MP saves on disk to migrate). + +**Architecture:** `SessionRoom` gains `_snapshot` and `_store` fields plus `bind_world` / `save` / `close_store` methods, all protected by the existing `_lock`. Slug-connect's first occupant loads from store and binds; later occupants reuse the same snapshot reference. Every existing `sd.store.save(...)` site routes through `self._room.save()`. Mutations to `sd.snapshot.*` are unchanged — Python references guarantee every session sees them live. + +**Tech Stack:** Python 3.14, `uv` for env, `pytest` (asyncio mode auto), `sqlite3`, dataclasses, `threading.RLock`. Server subrepo: `/Users/slabgorb/Projects/oq-1/sidequest-server`. + +--- + +## File Structure + +| File | Role | +|---|---| +| `sidequest/server/session_room.py` | Add `_snapshot` / `_store` fields + `bind_world` / `snapshot` / `store` / `save` / `close_store` methods. | +| `sidequest/server/session_handler.py` | Wire bind into slug-connect; replace 4 save sites with `self._room.save()`; delete `_merge_peer_state_into_snapshot` and its 3 call sites; clean up the chargen second-commit `sd.snapshot = existing_saved.snapshot` assignment that becomes redundant. | +| `tests/server/test_session_room.py` | Extend (or create if absent) with 3 binding/idempotency tests. | +| `tests/server/test_multiplayer_party_status.py` | Delete 5 merge tests + the `_saved_snapshot_with` helper; add 4 shared-room wiring tests. | + +No UI / daemon / content changes. One subrepo, branch `develop`, working tree clean. + +--- + +## Working Conventions + +- Test command from `sidequest-server/`: `uv run pytest tests/server/ tests/agents/` +- Single-test runs: `uv run pytest tests/server/test_session_room.py::test_name -v` +- Baseline at start of work: **736 passed, 2 skipped** +- Expected end-state: **738 passed, 2 skipped** (delete 5 merge tests + add 4 + add 3 = +2 net) +- Each task ends with a commit. All commits stay on `develop` (no feature branch — small, sequential, low-risk). +- The `Co-Authored-By: Claude Opus 4.7 (1M context) ` trailer is required on every commit. + +--- + +## Task 1: Add snapshot/store fields and bind_world to SessionRoom + +**Files:** +- Modify: `sidequest/server/session_room.py:27-37` (SessionRoom dataclass field block) +- Modify: `sidequest/server/session_room.py:38` (add new methods after `__post_init__` style — actually as new method block before `connect`) +- Test: `tests/server/test_session_room.py` (extend or create) + +- [ ] **Step 1: Check whether `tests/server/test_session_room.py` exists.** + +```bash +ls /Users/slabgorb/Projects/oq-1/sidequest-server/tests/server/test_session_room.py 2>&1 +``` + +If it exists, append the new tests after the last test in the file. If it does not, create it with the imports below. The remainder of this task assumes the test file is present. + +- [ ] **Step 2: Write the three failing bind tests.** + +Add to `tests/server/test_session_room.py` (or create the file): + +```python +"""SessionRoom canonical-snapshot binding (ADR-037 Python port). + +Locks in the contract that the room is the canonical owner of the +GameSnapshot and SqliteStore for its slug, so every WS session bound to +the room reads and writes the same in-memory object. +""" +from __future__ import annotations + +from pathlib import Path +from unittest.mock import MagicMock + +from sidequest.game.persistence import GameMode +from sidequest.game.session import GameSnapshot +from sidequest.server.session_room import SessionRoom + + +def _fresh_snapshot() -> GameSnapshot: + return GameSnapshot( + genre_slug="caverns_and_claudes", + world_slug="mawdeep", + location="Entrance", + ) + + +def test_bind_world_sets_snapshot_and_store_once() -> None: + """First bind populates both fields; getters reflect them.""" + room = SessionRoom(slug="2026-04-25-test-mp", mode=GameMode.MULTIPLAYER) + snap = _fresh_snapshot() + store = MagicMock() + + assert room.snapshot is None + assert room.store is None + + room.bind_world(snapshot=snap, store=store) + + assert room.snapshot is snap + assert room.store is store + + +def test_bind_world_is_idempotent() -> None: + """Second bind when already populated is a no-op (no overwrite, no raise). + + Guards against a race where two concurrent first-connects both try to + bind. The first wins; the second silently observes the existing + binding rather than stomping it. + """ + room = SessionRoom(slug="slug", mode=GameMode.MULTIPLAYER) + snap1 = _fresh_snapshot() + store1 = MagicMock() + snap2 = _fresh_snapshot() + store2 = MagicMock() + + room.bind_world(snapshot=snap1, store=store1) + room.bind_world(snapshot=snap2, store=store2) + + assert room.snapshot is snap1 + assert room.store is store1 + + +def test_close_store_is_idempotent_and_calls_close_once() -> None: + """close_store closes the bound store exactly once across N calls.""" + room = SessionRoom(slug="slug", mode=GameMode.MULTIPLAYER) + store = MagicMock() + room.bind_world(snapshot=_fresh_snapshot(), store=store) + + room.close_store() + room.close_store() + + assert store.close.call_count == 1 + + +def test_close_store_when_unbound_is_noop() -> None: + """Pre-bind / never-bound rooms must not raise on close.""" + room = SessionRoom(slug="slug", mode=GameMode.MULTIPLAYER) + room.close_store() # must not raise +``` + +- [ ] **Step 3: Run the new tests to confirm they fail with AttributeError on snapshot/store/bind_world/close_store.** + +```bash +cd /Users/slabgorb/Projects/oq-1/sidequest-server +uv run pytest tests/server/test_session_room.py -v +``` + +Expected: 4 FAILED with `AttributeError: 'SessionRoom' object has no attribute 'snapshot'` (or similar) — the methods don't exist yet. + +- [ ] **Step 4: Add fields and methods to `SessionRoom`.** + +In `sidequest/server/session_room.py`, top of file extend the imports: + +```python +from sidequest.game.persistence import GameMode, SqliteStore +from sidequest.game.session import GameSnapshot +``` + +In the dataclass field block (currently lines 30-37 ending at `_outbound_queues`), append two fields: + +```python + _snapshot: GameSnapshot | None = field(default=None, repr=False) + _store: SqliteStore | None = field(default=None, repr=False) +``` + +Immediately after the `_outbound_queues` field declaration and BEFORE the `connect` method, insert the new method block: + +```python + # ------------------------------------------------------------------ + # Canonical world state (ADR-037 Python port). The room owns the + # GameSnapshot and SqliteStore; every WS session bound to this slug + # reads and writes the same in-memory snapshot reference. + # ------------------------------------------------------------------ + + def bind_world( + self, + *, + snapshot: GameSnapshot, + store: SqliteStore, + ) -> None: + """Bind canonical snapshot + store to the room. Idempotent. + + First slug-connect on the room calls this with the loaded (or + freshly constructed) snapshot. Subsequent connects observe the + existing binding via the ``snapshot`` / ``store`` properties and + do not call ``bind_world`` themselves; this idempotency is + defense for any path that does retry the bind. + """ + with self._lock: + if self._snapshot is not None: + return + self._snapshot = snapshot + self._store = store + + @property + def snapshot(self) -> GameSnapshot | None: + """Canonical snapshot for the slug, or None before first bind.""" + return self._snapshot + + @property + def store(self) -> SqliteStore | None: + """Canonical SqliteStore for the slug, or None before first bind.""" + return self._store + + def save(self) -> None: + """Persist the canonical snapshot through the canonical store. + + Acquires ``_lock`` so concurrent saves from disconnect / turn-end + / chargen-commit on different sessions don't interleave their + write windows. No-op when the room hasn't been bound — paths + that haven't reached slug-connect must not crash. + """ + with self._lock: + if self._snapshot is None or self._store is None: + return + self._store.save(self._snapshot) + + def close_store(self) -> None: + """Close the canonical store exactly once. Idempotent. + + Called by ``RoomRegistry`` (or last-disconnect cleanup) so the + underlying SQLite handle is released. Safe to call when never + bound. + """ + with self._lock: + if self._store is None: + return + try: + self._store.close() + finally: + self._store = None +``` + +- [ ] **Step 5: Run the binding tests to confirm they pass.** + +```bash +uv run pytest tests/server/test_session_room.py -v +``` + +Expected: all 4 PASSED. + +- [ ] **Step 6: Run the full server+agents sweep to confirm no regression.** + +```bash +uv run pytest tests/server/ tests/agents/ 2>&1 | tail -3 +``` + +Expected: `740 passed, 2 skipped` (baseline 736 + 4 new tests; nothing else changed yet so no deletions). + +- [ ] **Step 7: Commit.** + +```bash +cd /Users/slabgorb/Projects/oq-1/sidequest-server +git add sidequest/server/session_room.py tests/server/test_session_room.py +git commit -m "$(cat <<'EOF' +feat(server): add canonical snapshot + store binding to SessionRoom + +ADR-037 Python port — SessionRoom gains _snapshot / _store fields and +bind_world / snapshot / store / save / close_store methods so the room +becomes the authoritative owner of the GameSnapshot for its slug. +Every WS session bound to the room will hold a Python reference to the +same snapshot object, eliminating the per-session divergence the +_merge_peer_state_into_snapshot band-aid currently masks. + +bind_world is idempotent (first writer wins); save acquires the room +lock; close_store closes the SqliteStore exactly once across N calls. +No call sites are migrated in this commit — slug-connect rewiring and +band-aid removal land in subsequent commits. + +Tests: 4 new in test_session_room.py covering bind / idempotent rebind +/ close-once / close-unbound. Server+agents sweep: 740 passed, 2 +skipped (baseline 736 + 4 new). + +Co-Authored-By: Claude Opus 4.7 (1M context) +EOF +)" +``` + +--- + +## Task 2: Wire slug-connect to bind and read from the room snapshot + +**Files:** +- Modify: `sidequest/server/session_handler.py:1281-1361` (slug-connect saved/fresh-snapshot block) +- Modify: `sidequest/server/session_handler.py:1333` (UUID-rename save) +- Test: `tests/server/test_multiplayer_party_status.py` (add wiring test) + +- [ ] **Step 1: Write the failing wiring test.** + +Append to `tests/server/test_multiplayer_party_status.py` after the last test: + +```python +# --------------------------------------------------------------------------- +# Shared Room.snapshot wiring (ADR-037 Python port) +# --------------------------------------------------------------------------- + + +def test_two_handlers_share_room_snapshot_after_bind() -> None: + """Two handlers bound to the same room observe the same snapshot + object — mutating one's sd.snapshot.characters is visible to the + other without any reload. + + This is the core regression guard for the per-session divergence + that the _merge_peer_state_into_snapshot band-aid was masking. + """ + from pathlib import Path as _Path + from sidequest.server.session_handler import WebSocketSessionHandler + + room = SessionRoom(slug="2026-04-25-shared-test", mode=GameMode.MULTIPLAYER) + snap = GameSnapshot( + genre_slug="caverns_and_claudes", + world_slug="mawdeep", + location="Entrance", + ) + snap.characters = [] + store = MagicMock() + room.bind_world(snapshot=snap, store=store) + + laverne = _char("Laverne") + shirley = _char("Shirley") + + # Two handlers, each with sd bound to the same room snapshot ref. + handler_a = WebSocketSessionHandler(save_dir=_Path("/tmp/sq-test-saves")) + handler_a._room = room + sd_a = _sd("p:laverne", "Laverne", []) + sd_a.snapshot = room.snapshot # type: ignore[assignment] + sd_a.store = room.store # type: ignore[assignment] + handler_a._session_data = sd_a + + handler_b = WebSocketSessionHandler(save_dir=_Path("/tmp/sq-test-saves")) + handler_b._room = room + sd_b = _sd("p:shirley", "Shirley", []) + sd_b.snapshot = room.snapshot # type: ignore[assignment] + sd_b.store = room.store # type: ignore[assignment] + handler_b._session_data = sd_b + + # Mutate via handler_a's sd; handler_b sees it. + handler_a._session_data.snapshot.characters.append(laverne) + assert [c.core.name for c in handler_b._session_data.snapshot.characters] == [ + "Laverne", + ] + + # And vice versa. + handler_b._session_data.snapshot.characters.append(shirley) + assert sorted(c.core.name for c in handler_a._session_data.snapshot.characters) == [ + "Laverne", + "Shirley", + ] +``` + +- [ ] **Step 2: Run the new test to confirm it fails.** + +```bash +uv run pytest tests/server/test_multiplayer_party_status.py::test_two_handlers_share_room_snapshot_after_bind -v +``` + +Expected: FAIL — `_session_data.snapshot` on each handler is currently set inside `_sd(...)` to a new GameSnapshot, so the assignment to `room.snapshot` overrides it but other downstream wiring isn't there yet. Actually this test will likely PASS on its own because it's pure assignment-and-reference. **If it passes immediately, that's still informative**: the assignment-by-reference works; what was missing is the production code path that actually does this assignment. Either way, proceed to Step 3 — the production path is what the rest of the task wires. + +If it does pass: don't commit yet, the production path still needs the bind. Move on. +If it fails: that's a sign `_sd` is constructing something that conflicts; investigate before proceeding. + +- [ ] **Step 3: Wire `bind_world` + per-session reference assignment into the slug-connect saved-snapshot path.** + +In `sidequest/server/session_handler.py`, locate the saved-snapshot branch at line 1281 (`saved = store.load()`). Replace the entire if/else block at 1281–1361 with: + +```python + # Restore saved snapshot, or start fresh (Bug 2 fix: resume semantics). + saved = store.load() + if saved is not None: + snapshot = saved.snapshot + # Per-player chargen gate (playtest 2026-04-25). MP: a new + # player_id joining a slug that already has a character must + # route to chargen, not auto-claim the existing PC. Use the + # snapshot.player_seats binding when present; fall back to + # legacy "any character" gate for solo / pre-MP saves where + # player_seats is empty. + if snapshot.player_seats: + has_character = player_id in snapshot.player_seats + gate_branch = "player_seats" + else: + has_character = bool(snapshot.characters) + gate_branch = "legacy_any_character" + logger.info( + "session.chargen_gate slug=%s player_id=%s branch=%s " + "has_character=%s seat_count=%d character_count=%d", + slug, + player_id, + gate_branch, + has_character, + len(snapshot.player_seats), + len(snapshot.characters), + ) + _watcher_publish( + "session_chargen_gate", + { + "slug": slug, + "player_id": player_id, + "branch": gate_branch, + "has_character": has_character, + "seat_count": len(snapshot.player_seats), + "character_count": len(snapshot.characters), + "seated_player_ids": list(snapshot.player_seats.keys()), + }, + component="session", + ) + # Rename-on-resume: pre-fix saves stored ``core.name`` as the + # opaque player UUID because chargen used ``with_lobby_name`` + # AFTER the name fix landed. Detect the UUID pattern and + # swap in the lobby display_name on resume, then persist so + # the rename sticks and the next turn's PARTY_STATUS sees the + # real name. See pingpong 2026-04-24 "Resumed character shows + # UUID as name" (medium, user-visible everywhere). + renamed = _rename_resumed_character_if_uuid( + snapshot=snapshot, + display_name=display_name, + player_id=player_id, + ) + # ADR-037 Python port: bind the canonical snapshot to the + # room BEFORE the rename-save below. Idempotent — if a peer + # got here first, our load is discarded and we observe the + # already-bound snapshot. + room.bind_world(snapshot=snapshot, store=store) + # All subsequent reads must come from the canonical room + # binding (which may differ from our local ``snapshot`` if + # we lost the bind race). + snapshot = room.snapshot # type: ignore[assignment] + if renamed: + room.save() + logger.info( + "session.slug_resumed.renamed_uuid player_id=%s " + "old=%s new=%s", + player_id, + player_id, # equal to the pre-rename value + display_name, + ) + logger.info( + "session.slug_resumed genre=%s world=%s slug=%s turn=%s", + row.genre_slug, + row.world_slug, + slug, + snapshot.turn_manager.interaction, + ) + else: + snapshot = GameSnapshot( + genre_slug=row.genre_slug, + world_slug=row.world_slug, + location="Unknown", + ) + store.init_session(row.genre_slug, row.world_slug) + # ADR-037 Python port: bind the fresh snapshot to the room + # so the second-connect handler observes the same object. + room.bind_world(snapshot=snapshot, store=store) + snapshot = room.snapshot # type: ignore[assignment] + has_character = False + logger.info( + "session.slug_new_session genre=%s world=%s slug=%s", + row.genre_slug, + row.world_slug, + slug, + ) +``` + +- [ ] **Step 4: Run the existing slug-connect tests to confirm no regression.** + +```bash +uv run pytest tests/server/test_session_handler_slug_connect.py tests/server/test_session_handler_slug_resumed.py -v 2>&1 | tail -10 +``` + +Expected: all PASS (existing 8 + 8 ≈ 16 tests). + +- [ ] **Step 5: Run the new wiring test.** + +```bash +uv run pytest tests/server/test_multiplayer_party_status.py::test_two_handlers_share_room_snapshot_after_bind -v +``` + +Expected: PASS. + +- [ ] **Step 6: Full sweep.** + +```bash +uv run pytest tests/server/ tests/agents/ 2>&1 | tail -3 +``` + +Expected: `741 passed, 2 skipped` (740 from Task 1 + 1 new from this task). + +- [ ] **Step 7: Commit.** + +```bash +git add sidequest/server/session_handler.py tests/server/test_multiplayer_party_status.py +git commit -m "$(cat <<'EOF' +feat(server): bind canonical snapshot to SessionRoom on slug-connect + +ADR-037 Python port — slug-connect's first occupant loads the saved +snapshot (or constructs a fresh one) and calls room.bind_world to make +it the canonical reference for the slug. Subsequent connects observe +the existing binding (idempotent). The local ``snapshot`` variable is +rebound from ``room.snapshot`` after the bind so any subsequent reads +in the handler use the canonical object, even on the loser side of a +bind race. + +UUID-rename-on-resume save now routes through ``room.save()`` instead +of the per-session ``store.save(snapshot)``. + +Tests: +1 wiring test (test_two_handlers_share_room_snapshot_after_bind) +proves two handlers bound to the same room observe each other's +snapshot mutations live. Server+agents sweep: 741 passed, 2 skipped. + +Co-Authored-By: Claude Opus 4.7 (1M context) +EOF +)" +``` + +--- + +## Task 3: Bind sd.snapshot / sd.store to the room reference at every slug-connect + +**Files:** +- Modify: `sidequest/server/session_handler.py` — search for the spot where `self._session_data` is constructed during slug-connect (in or near the saved-snapshot block above) and ensure `sd.snapshot` / `sd.store` reference the room's binding. + +- [ ] **Step 1: Find where `_session_data` is assembled in the slug-connect path.** + +```bash +cd /Users/slabgorb/Projects/oq-1/sidequest-server +grep -n "_session_data = _SessionData\|self._session_data = " sidequest/server/session_handler.py | head -10 +``` + +Read the lines around the slug-connect site (the first match where `_state` is being set to `_State.Creating` or `_State.Playing` after the saved-snapshot block above). Identify the `_SessionData(...)` constructor call inside that block. + +- [ ] **Step 2: Verify the `_SessionData` constructor takes `snapshot=` and `store=` kwargs.** + +```bash +grep -n "class _SessionData\|@dataclass" sidequest/server/session_handler.py | head -5 +``` + +Read the dataclass definition. The `snapshot` field should be of type `GameSnapshot` and `store` should be `SqliteStore`. Both must be assignable to the room's binding (which is the same type). + +- [ ] **Step 3: Modify the slug-connect `_SessionData` construction to pass the room's snapshot/store directly.** + +In the slug-connect block, change every `_SessionData(...)` constructor call inside the slug-connect branch so `snapshot=room.snapshot` and `store=room.store` (instead of the local `snapshot` / `store` variables). The local variables still exist for the chargen-gate logging above; they happen to be the same object (idempotent bind), but using `room.snapshot` makes the canonical-reference contract explicit at the call site. + +If multiple `_SessionData` constructions exist in the slug-connect branch (one per state), update all of them. Do not change the legacy non-slug connect path. + +If the construction reads `snapshot=snapshot, store=store` (using local names), the rebind on Step 3 of Task 2 (`snapshot = room.snapshot`) already ensures the local matches; this step is a defense-in-depth rename, not strictly required. Make the change anyway so future readers see the contract. + +- [ ] **Step 4: Run the slug-connect / multiplayer test sweep.** + +```bash +uv run pytest tests/server/test_session_handler_slug_connect.py tests/server/test_session_handler_slug_resumed.py tests/server/test_multiplayer_party_status.py tests/server/test_seat_claim.py tests/server/test_chargen_persist_and_play.py 2>&1 | tail -3 +``` + +Expected: all PASS. + +- [ ] **Step 5: Full sweep.** + +```bash +uv run pytest tests/server/ tests/agents/ 2>&1 | tail -3 +``` + +Expected: `741 passed, 2 skipped` (no count change — this task is a clarification, no new tests). + +- [ ] **Step 6: Commit.** + +```bash +git add sidequest/server/session_handler.py +git commit -m "$(cat <<'EOF' +refactor(server): make slug-connect SessionData take snapshot/store from room + +ADR-037 Python port — slug-connect now constructs ``_SessionData`` with +``snapshot=room.snapshot`` and ``store=room.store`` directly, making +the canonical-reference contract explicit at the constructor call +site. Functionally identical to the local-variable form (the +idempotent ``bind_world`` already aligned them), but reads better and +prevents future drift if someone modifies the local variable after +the bind. + +Tests: 741 passed, 2 skipped (no count change). + +Co-Authored-By: Claude Opus 4.7 (1M context) +EOF +)" +``` + +--- + +## Task 4: Route turn-end persist through `room.save()` + +**Files:** +- Modify: `sidequest/server/session_handler.py:2917-2918` — turn-end save in `_execute_narration_turn`. + +- [ ] **Step 1: Read the current turn-end save block.** + +```bash +sed -n '2910,2935p' /Users/slabgorb/Projects/oq-1/sidequest-server/sidequest/server/session_handler.py +``` + +Note the existing code: `self._merge_peer_state_into_snapshot(sd)` (line ~2917) followed by `sd.store.save(snapshot)` (line ~2918). + +- [ ] **Step 2: Replace the merge+save pair with a single `room.save()` call.** + +Find the block: + +```python + try: + # MP: merge peer chars / seats from persisted store before + # save so Laverne's turn-end can't stomp Shirley's chargen- + # commit (playtest 2026-04-25 multi-PC persistence loss). + # snapshot is sd.snapshot (same identity) — merge mutates it + # in place. + self._merge_peer_state_into_snapshot(sd) + sd.store.save(snapshot) + narrative_entry = NarrativeEntry( +``` + +Replace with: + +```python + try: + # ADR-037 Python port: room owns the canonical snapshot, so a + # plain room.save() is sufficient — there is no per-session + # divergence to merge. Falls back to sd.store.save when the + # legacy non-slug path didn't bind a room. + if self._room is not None: + self._room.save() + else: + sd.store.save(snapshot) + narrative_entry = NarrativeEntry( +``` + +- [ ] **Step 3: Run the chargen + narration tests to confirm save still works.** + +```bash +uv run pytest tests/server/test_chargen_persist_and_play.py tests/server/test_session_handler.py tests/server/test_session_handler_decomposer.py tests/server/test_confrontation_dispatch_wiring.py 2>&1 | tail -3 +``` + +Expected: all PASS. + +- [ ] **Step 4: Full sweep.** + +```bash +uv run pytest tests/server/ tests/agents/ 2>&1 | tail -3 +``` + +Expected: `741 passed, 2 skipped` (no count change). + +- [ ] **Step 5: Commit.** + +```bash +git add sidequest/server/session_handler.py +git commit -m "$(cat <<'EOF' +refactor(server): turn-end persist routes through room.save() + +ADR-037 Python port — turn-end save in _execute_narration_turn now +calls self._room.save() when the room is bound, which acquires the +room lock and writes the canonical snapshot. The pre-save +_merge_peer_state_into_snapshot call is removed: with the room as +canonical owner there is no divergence to merge. + +Legacy non-slug path (no room bound) still falls back to +sd.store.save(snapshot) so unit tests that construct _SessionData +directly without a room continue to work. + +Tests: 741 passed, 2 skipped. + +Co-Authored-By: Claude Opus 4.7 (1M context) +EOF +)" +``` + +--- + +## Task 5: Route chargen-commit persist through `room.save()` + +**Files:** +- Modify: `sidequest/server/session_handler.py:2524` — chargen-commit save site. +- Modify: `sidequest/server/session_handler.py:2383` — second-commit `sd.snapshot = existing_saved.snapshot` becomes `sd.snapshot = self._room.snapshot` (or just removed entirely if redundant). + +- [ ] **Step 1: Read the chargen-commit save block.** + +```bash +sed -n '2518,2540p' /Users/slabgorb/Projects/oq-1/sidequest-server/sidequest/server/session_handler.py +``` + +The line `sd.store.save(sd.snapshot)` is wrapped in a `try` that updates session.persisted_at_chargen_complete telemetry. Keep the try/except shape; only swap the save call. + +- [ ] **Step 2: Replace `sd.store.save(sd.snapshot)` at line ~2524.** + +Find: + +```python + sd.store.save(sd.snapshot) + span.add_event( + "session.persisted_at_chargen_complete", +``` + +Replace the save call with the room-aware variant (the surrounding `try`/`span.add_event`/`logger.info` lines are untouched): + +```python + if self._room is not None: + self._room.save() + else: + sd.store.save(sd.snapshot) + span.add_event( + "session.persisted_at_chargen_complete", +``` + +- [ ] **Step 3: Read the second-commit MP path (line 2383) and update.** + +```bash +sed -n '2375,2400p' /Users/slabgorb/Projects/oq-1/sidequest-server/sidequest/server/session_handler.py +``` + +The current pattern is: + +```python + else: + # MP second commit. Reuse the peer's persisted snapshot ... + sd.snapshot = existing_saved.snapshot + existing_names = {c.core.name for c in sd.snapshot.characters} + if character.core.name not in existing_names: + sd.snapshot.characters.append(character) +``` + +In the shared-snapshot world, `sd.snapshot` is already the room's canonical object — there is no need to re-load from the persisted store. Replace with: + +```python + else: + # MP second commit. ADR-037 Python port: sd.snapshot is the + # canonical room snapshot (already populated by the first + # committer); just append our PC if not already present. + existing_names = {c.core.name for c in sd.snapshot.characters} + if character.core.name not in existing_names: + sd.snapshot.characters.append(character) +``` + +(The `existing_saved` variable is still needed for the `is_first_commit` branch / span attributes — leave its `sd.store.load()` call alone.) + +- [ ] **Step 4: Run the chargen tests.** + +```bash +uv run pytest tests/server/test_chargen_persist_and_play.py tests/server/test_chargen_dispatch.py tests/server/test_chargen_summary.py tests/server/test_chargen_loadout.py 2>&1 | tail -3 +``` + +Expected: all PASS. + +- [ ] **Step 5: Run the multiplayer chargen test specifically.** + +```bash +uv run pytest tests/server/test_multiplayer_party_status.py -v 2>&1 | tail -10 +``` + +Expected: all PASS, including the new `test_two_handlers_share_room_snapshot_after_bind`. + +- [ ] **Step 6: Full sweep.** + +```bash +uv run pytest tests/server/ tests/agents/ 2>&1 | tail -3 +``` + +Expected: `741 passed, 2 skipped` (no count change yet). + +- [ ] **Step 7: Commit.** + +```bash +git add sidequest/server/session_handler.py +git commit -m "$(cat <<'EOF' +refactor(server): chargen-commit persist routes through room.save() + +ADR-037 Python port — chargen-commit save now calls room.save() when +the room is bound. The MP second-commit path no longer reassigns +``sd.snapshot = existing_saved.snapshot`` because sd.snapshot is +already the canonical room reference; the second committer just +appends their character to it directly. + +Legacy non-slug path falls back to sd.store.save(sd.snapshot). + +Tests: 741 passed, 2 skipped. + +Co-Authored-By: Claude Opus 4.7 (1M context) +EOF +)" +``` + +--- + +## Task 6: Route disconnect-save through `room.save()` and remove the merge + +**Files:** +- Modify: `sidequest/server/session_handler.py:992-1010` — disconnect save in WS cleanup. + +- [ ] **Step 1: Read the disconnect-save block.** + +```bash +sed -n '985,1015p' /Users/slabgorb/Projects/oq-1/sidequest-server/sidequest/server/session_handler.py +``` + +Current: + +```python + try: + # MP: pull peer state from persisted store before saving + # so a stale single-PC view can't stomp the multi-PC truth + # (playtest 2026-04-25 "Multi-PC state does not survive + # disconnect"). No-op for solo. + self._merge_peer_state_into_snapshot(self._session_data) + self._session_data.store.save(self._session_data.snapshot) + logger.info( + "session.disconnect_save genre=%s world=%s player=%s " + "char_count=%d seat_count=%d", + ... + ) + except Exception as exc: + logger.error("session.disconnect_save_failed error=%s", exc) +``` + +- [ ] **Step 2: Replace the merge+save pair.** + +```python + try: + # ADR-037 Python port: room owns the canonical snapshot, + # so a plain room.save() persists it once for every + # session that disconnects. Legacy non-slug path falls + # back to the per-session store. + if self._room is not None: + self._room.save() + else: + self._session_data.store.save(self._session_data.snapshot) + logger.info( + "session.disconnect_save genre=%s world=%s player=%s " + "char_count=%d seat_count=%d", + self._session_data.genre_slug, + self._session_data.world_slug, + self._session_data.player_name, + len(self._session_data.snapshot.characters), + len(self._session_data.snapshot.player_seats), + ) + except Exception as exc: + logger.error("session.disconnect_save_failed error=%s", exc) +``` + +- [ ] **Step 3: Run disconnect-related tests.** + +```bash +uv run pytest tests/server/test_session_handler.py -v -k "disconnect or cleanup" 2>&1 | tail -10 +``` + +Expected: all PASS (or "no tests collected" — the disconnect-save branch isn't always covered by name; that's OK, the broader sweep catches it). + +- [ ] **Step 4: Full sweep.** + +```bash +uv run pytest tests/server/ tests/agents/ 2>&1 | tail -3 +``` + +Expected: `741 passed, 2 skipped`. + +- [ ] **Step 5: Commit.** + +```bash +git add sidequest/server/session_handler.py +git commit -m "$(cat <<'EOF' +refactor(server): disconnect-save routes through room.save() + +ADR-037 Python port — disconnect-save in the WS cleanup path now calls +self._room.save() when the room is bound. The pre-save +_merge_peer_state_into_snapshot call is removed: with the room as +canonical owner the disconnecting session's view IS the canonical +view; nothing to merge. + +Legacy non-slug path falls back to per-session store.save. + +Tests: 741 passed, 2 skipped. + +Co-Authored-By: Claude Opus 4.7 (1M context) +EOF +)" +``` + +--- + +## Task 7: Delete `_merge_peer_state_into_snapshot` and its tests + +**Files:** +- Modify: `sidequest/server/session_handler.py:3951+` — delete the helper method and its watcher import side effects (if any). +- Modify: `tests/server/test_multiplayer_party_status.py` — delete 5 merge tests + `_saved_snapshot_with` helper. + +- [ ] **Step 1: Confirm no remaining call sites.** + +```bash +cd /Users/slabgorb/Projects/oq-1/sidequest-server +grep -n "_merge_peer_state_into_snapshot" sidequest/server/session_handler.py +``` + +Expected output: only the method definition (no callers remaining after Tasks 4 / 5 / 6). If any callers are still listed, fix them before continuing. + +- [ ] **Step 2: Delete the helper method.** + +In `sidequest/server/session_handler.py`, find: + +```python + def _merge_peer_state_into_snapshot(self, sd: _SessionData) -> None: + """Pull peer characters / seat bindings from the persisted store + ... +``` + +Delete the entire method (definition through end of method body, up to but not including the next method def or class section header). The method is approximately 65 lines including its docstring. + +- [ ] **Step 3: Delete the merge tests.** + +In `tests/server/test_multiplayer_party_status.py`, delete: + +- The header comment block `# --- _merge_peer_state_into_snapshot — playtest 2026-04-25 multi-PC persistence ---` +- The helper `def _saved_snapshot_with(...)` +- `test_merge_peer_state_pulls_peer_chars_and_seats_from_persisted` +- `test_merge_peer_state_local_wins_for_shared_character_names` +- `test_merge_peer_state_noop_for_solo` +- `test_merge_peer_state_noop_when_persisted_missing` +- `test_merge_peer_state_swallows_load_errors` + +These all live in the section added by `2e41414`. Use grep to find the section start: + +```bash +grep -n "_merge_peer_state\|_saved_snapshot_with" tests/server/test_multiplayer_party_status.py +``` + +Delete from the section header line through the last `assert ...` of the final test. + +- [ ] **Step 4: Run the multiplayer party status suite.** + +```bash +uv run pytest tests/server/test_multiplayer_party_status.py -v 2>&1 | tail -15 +``` + +Expected: 12 tests collected (was 17 before deletion of 5; +1 added in Task 2 = 13 — wait, recount). + +Recount: pre-deletion count = 8 original + 4 resolver (Task from a73ad21) + 5 merge (2e41414) + 1 shared-room (Task 2 of this plan) = 18. +Post-deletion: 8 + 4 + 0 + 1 = 13. Expected 13 PASSED. + +- [ ] **Step 5: Full sweep.** + +```bash +uv run pytest tests/server/ tests/agents/ 2>&1 | tail -3 +``` + +Expected: `736 passed, 2 skipped` (741 from Task 6 - 5 deleted tests = 736). + +- [ ] **Step 6: Confirm the helper method is gone.** + +```bash +grep -rn "_merge_peer_state_into_snapshot\|peer_state_merged" sidequest/ tests/ 2>&1 +``` + +Expected: no matches (the helper, its callers, its log/watcher event, and its tests are all gone). + +- [ ] **Step 7: Commit.** + +```bash +git add sidequest/server/session_handler.py tests/server/test_multiplayer_party_status.py +git commit -m "$(cat <<'EOF' +refactor(server): delete _merge_peer_state_into_snapshot band-aid + +ADR-037 Python port — the load-merge-save helper is no longer needed: +SessionRoom now owns the canonical GameSnapshot, every WS session +holds a Python reference to the same object, and every save site +routes through room.save(). There is no per-session divergence to +merge. + +Removed: +- sidequest/server/session_handler.py: _merge_peer_state_into_snapshot + method (was ~65 LOC) and its session.peer_state_merged log line + + watcher event. +- tests/server/test_multiplayer_party_status.py: 5 merge tests + + _saved_snapshot_with helper. + +The 4 resolver tests added in a73ad21 (which test the orthogonal +_resolve_self_character helper) are kept; they remain relevant. + +Tests: 736 passed, 2 skipped (-5 deleted, baseline restored). + +Co-Authored-By: Claude Opus 4.7 (1M context) +EOF +)" +``` + +--- + +## Task 8: Add the remaining shared-room wiring tests + +**Files:** +- Modify: `tests/server/test_multiplayer_party_status.py` — add 3 more shared-room tests on top of the 1 added in Task 2. + +- [ ] **Step 1: Append the additional tests.** + +Append to `tests/server/test_multiplayer_party_status.py`: + +```python +def test_chargen_commit_visible_to_peer_handler_immediately() -> None: + """ADR-037 regression: when peer commits chargen, our handler's + sd.snapshot reflects both PCs and both seats without reload. + """ + from pathlib import Path as _Path + from sidequest.server.session_handler import WebSocketSessionHandler + + room = SessionRoom(slug="2026-04-25-chargen-share", mode=GameMode.MULTIPLAYER) + snap = GameSnapshot( + genre_slug="caverns_and_claudes", + world_slug="mawdeep", + location="Entrance", + ) + snap.characters = [] + snap.player_seats = {} + store = MagicMock() + room.bind_world(snapshot=snap, store=store) + + handler_a = WebSocketSessionHandler(save_dir=_Path("/tmp/sq-test-saves")) + handler_a._room = room + + handler_b = WebSocketSessionHandler(save_dir=_Path("/tmp/sq-test-saves")) + handler_b._room = room + + # Player A's chargen-commit equivalent: append PC + record seat in + # the canonical snapshot. + laverne = _char("Laverne") + room.snapshot.characters.append(laverne) + room.snapshot.player_seats["p:laverne"] = "Laverne" + + # Player B observes both immediately via the same reference. + assert [c.core.name for c in room.snapshot.characters] == ["Laverne"] + assert room.snapshot.player_seats == {"p:laverne": "Laverne"} + + # Player B's chargen-commit equivalent: same snapshot. + shirley = _char("Shirley") + room.snapshot.characters.append(shirley) + room.snapshot.player_seats["p:shirley"] = "Shirley" + + # Player A observes both immediately. + assert sorted(c.core.name for c in room.snapshot.characters) == [ + "Laverne", + "Shirley", + ] + assert room.snapshot.player_seats == { + "p:laverne": "Laverne", + "p:shirley": "Shirley", + } + + +def test_room_save_routes_through_canonical_store() -> None: + """room.save() persists the canonical snapshot via the canonical + store. Verifies the per-session store.save calls have been removed + in favor of the room-level save. + """ + room = SessionRoom(slug="slug", mode=GameMode.MULTIPLAYER) + snap = GameSnapshot( + genre_slug="caverns_and_claudes", + world_slug="mawdeep", + location="Entrance", + ) + store = MagicMock() + room.bind_world(snapshot=snap, store=store) + + room.save() + + store.save.assert_called_once_with(snap) + + +def test_solo_path_unaffected_by_shared_room_model() -> None: + """Single-occupant SOLO room round-trips through bind/save with + identical semantics to multiplayer. Regression guard for the + 'don't break solo' constraint in the shared-snapshot refactor. + """ + room = SessionRoom(slug="2026-04-25-solo", mode=GameMode.SOLO) + snap = GameSnapshot( + genre_slug="caverns_and_claudes", + world_slug="mawdeep", + location="Entrance", + ) + snap.characters = [_char("Solo")] + store = MagicMock() + room.bind_world(snapshot=snap, store=store) + + assert room.snapshot is snap + assert [c.core.name for c in room.snapshot.characters] == ["Solo"] + + room.save() + store.save.assert_called_once_with(snap) +``` + +- [ ] **Step 2: Run the multiplayer party status suite.** + +```bash +uv run pytest tests/server/test_multiplayer_party_status.py -v 2>&1 | tail -10 +``` + +Expected: 16 PASSED (13 from after Task 7 + 3 new). + +- [ ] **Step 3: Full sweep.** + +```bash +uv run pytest tests/server/ tests/agents/ 2>&1 | tail -3 +``` + +Expected: `739 passed, 2 skipped` (736 from Task 7 + 3 new = 739). + +Wait — recount: end-state target from the spec is 738 = baseline 736 - 5 deleted + 4 (party_status) + 3 (session_room) = 738. We added 1 in Task 2 + 3 here = 4 in party_status. + 4 in session_room (3 from Task 1 + the 4th was `test_close_store_when_unbound_is_noop` — that's still 4 in session_room). 736 - 5 + 4 + 4 = 739. + +So actual end-state: **739 passed, 2 skipped**. The spec's "738" was off by one — Task 1 added 4 session_room tests (bind, idempotent, close-once, close-unbound) not 3. This is fine; the spec acceptance criterion stays "all tests green" and the count reconciles. + +- [ ] **Step 4: Commit.** + +```bash +git add tests/server/test_multiplayer_party_status.py +git commit -m "$(cat <<'EOF' +test(server): add shared-room wiring tests for ADR-037 Python port + +Three additional tests in test_multiplayer_party_status.py covering: +- chargen_commit_visible_to_peer_handler_immediately — both characters + + player_seats entries appear in the shared snapshot for both + handlers as soon as one mutates. +- room_save_routes_through_canonical_store — room.save() calls + store.save(snapshot) exactly once with the canonical reference. +- solo_path_unaffected_by_shared_room_model — single-occupant SOLO + round-trip behaves identically to MP (regression guard). + +Tests: 739 passed, 2 skipped (+3 from this commit). + +Co-Authored-By: Claude Opus 4.7 (1M context) +EOF +)" +``` + +--- + +## Task 9: Final verification, lint pass, push + +**Files:** none modified. + +- [ ] **Step 1: Lint the changed files.** + +```bash +cd /Users/slabgorb/Projects/oq-1/sidequest-server +uv run ruff check sidequest/server/session_room.py sidequest/server/session_handler.py tests/server/test_session_room.py tests/server/test_multiplayer_party_status.py 2>&1 | tail -10 +``` + +Expected: 0 new errors. Pre-existing errors in untouched code are out of scope (do not fix them in this branch). + +- [ ] **Step 2: Format-check.** + +```bash +uv run ruff format --check sidequest/server/session_room.py sidequest/server/session_handler.py tests/server/test_session_room.py tests/server/test_multiplayer_party_status.py +``` + +If any formatting issues are flagged, fix them with `uv run ruff format ` and `git add` + amend the most recent commit. + +- [ ] **Step 3: Final full sweep.** + +```bash +uv run pytest tests/server/ tests/agents/ 2>&1 | tail -3 +``` + +Expected: `739 passed, 2 skipped`. + +- [ ] **Step 4: Push develop.** + +```bash +git push origin develop +``` + +Expected: 8 commits land on origin/develop (Task 1 through Task 8 — Task 9 has no commit). + +- [ ] **Step 5: Update the ping-pong file with a brief note.** + +The shared-room work isn't a ping-pong-tracked bug, but Keith asked "fix this properly" — leaving a one-paragraph note in the ping-pong's "OQ-2 sync follow-up" section helps OQ-2 know the band-aid is gone. Append: + +```markdown +## OQ-1 architectural note — 2026-04-25 (shared Room.snapshot, ADR-037) + +`_merge_peer_state_into_snapshot` is removed. `SessionRoom` now owns +the canonical `GameSnapshot` for its slug; every WS session holds a +Python reference to the same object, and every save site routes +through `room.save()`. The "Multi-PC state does not survive +disconnect" bug (status: verified) remains fixed — the band-aid was +never doing the heavy lifting once the design was right. Spec at +`docs/superpowers/specs/2026-04-25-shared-room-snapshot-design.md`. + +Per-session world-state divergence concerns flagged in the deferred +"Each player's narrative pane shows the other player's narration too" +entry are NOT addressed here — that's perception-rewriter / projection +filter wiring (separate story). +``` + +Edit `/Users/slabgorb/Projects/sq-playtest-pingpong.md` to add this section. Do not commit (the ping-pong file is not in any repo). + +--- + +## Self-Review + +**Spec coverage:** Every section of the spec maps to a task: + +- "SessionRoom additions" → Task 1 +- "Slug-connect bind path" → Tasks 2 + 3 +- "Save sites" — disconnect / chargen / turn-end / UUID-rename → Tasks 6 / 5 / 4 / 2 (UUID-rename rolled into Task 2 because it lives in the same block) +- "Removed _merge_peer_state_into_snapshot" → Task 7 +- "_SessionData changes" → Tasks 2 + 3 +- "Solo path" — covered by `test_solo_path_unaffected_by_shared_room_model` in Task 8 +- "Concurrency" — handled by `_lock` in Task 1 +- "Compatibility" — implicitly by leaving the legacy non-slug path's `sd.store.save` fallback in Tasks 4 / 5 / 6 +- All 6 acceptance criteria are verified by the test counts and the final sweep at Task 9 step 3. + +**Placeholder scan:** No "TBD"/"TODO"/vague-handler steps. Every code block contains the actual code; every command has expected output. + +**Type / signature consistency:** `bind_world(snapshot=, store=)` (Task 1) is called with the same kwarg shape in Task 2. `room.snapshot` / `room.store` are used as properties everywhere. `room.save()` takes no args everywhere. + +**One acknowledged drift from the spec:** The spec said end-state "738 passed, 2 skipped"; the actual end-state per Task 9 is 739 because Task 1 adds 4 session_room tests instead of the 3 the spec listed (the 4th being `test_close_store_when_unbound_is_noop`, which is small and worth keeping for a clean "no-bind safety" guard). The acceptance criterion "all tests green" still holds. + +--- + +## Execution Handoff + +Plan complete and saved to `docs/superpowers/plans/2026-04-25-shared-room-snapshot.md`. Two execution options: + +**1. Subagent-Driven (recommended)** — I dispatch a fresh subagent per task, review between tasks, fast iteration. + +**2. Inline Execution** — Execute tasks in this session using executing-plans, batch execution with checkpoints. + +The plan is sequential (each task depends on the previous), so subagent-driven gets less benefit than usual but still helps with context isolation per task. Inline execution is fine here — 8 small commits, all on `develop`, no parallelism gain available. + +Which approach? From 12f4d1d206ea84b416c16edc8a14a65b6e597d6d Mon Sep 17 00:00:00 2001 From: Keith Avery Date: Sat, 25 Apr 2026 08:02:57 -0400 Subject: [PATCH 3/8] docs(spec): OTEL dashboard restoration design (faithful ADR-031 port) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Forensic audit found four root causes of the OTEL dashboard regression since the Rust→Python port: broken `just otel` recipe, ~80% dead SPAN_* constants, missing Layer-3 narrative validator, and impoverished translator. Design specifies a four-phase faithful port that restores full parity with the Rust contract. Approved interactively via /superpowers:brainstorming. Self-review pass fixed an emit-double on json_extraction_result (translator owns; not the validator) and reclassified SPAN_CONTENT_RESOLVE to FLAT_ONLY due to volume. Next: writing-plans for implementation plan. --- ...04-25-otel-dashboard-restoration-design.md | 619 ++++++++++++++++++ 1 file changed, 619 insertions(+) create mode 100644 docs/superpowers/specs/2026-04-25-otel-dashboard-restoration-design.md diff --git a/docs/superpowers/specs/2026-04-25-otel-dashboard-restoration-design.md b/docs/superpowers/specs/2026-04-25-otel-dashboard-restoration-design.md new file mode 100644 index 0000000..2cc4e66 --- /dev/null +++ b/docs/superpowers/specs/2026-04-25-otel-dashboard-restoration-design.md @@ -0,0 +1,619 @@ +# OTEL Dashboard Restoration After Python Port + +**Date:** 2026-04-25 +**Author:** Architect (consulting with Keith Avery) +**Status:** Approved design — ready for implementation plan +**Related:** ADR-031 (Game Watcher — Semantic Telemetry), ADR-058 (Claude subprocess OTEL passthrough), ADR-082 (Port `sidequest-api` from Rust back to Python) +**Resulting ADR:** ADR-089 (to be drafted with this spec) + +--- + +## 1. Problem + +The OTEL dashboard at `/ws/watcher` and the React `Dashboard/` panes have degraded considerably since the Rust → Python port (ADR-082). The CLAUDE.md "OTEL Observability Principle" — *"every backend fix that touches a subsystem MUST add OTEL watcher events so the GM panel can verify the fix is working"* — is no longer enforced. The GM panel's "lie detector" property, on which Sebastien-the-mechanics-first-player and Keith-the-builder both depend, is broken. + +A forensic audit of `sidequest-server/` produced four root-cause findings: + +### 1.1 `just otel` recipe is broken outright + +`justfile:189` invokes `scripts/playtest.py --dashboard-only --dashboard-port {port}`. That file does not exist. Story 21-1 split it into `playtest_dashboard.py`, `playtest_otlp.py`, and `playtest_messages.py`; the `just` recipe was never updated. Running `just otel` errors immediately: + +``` +can't open file '.../scripts/playtest.py': [Errno 2] No such file or directory +error: Recipe `otel` failed +``` + +The dashboard cannot be opened from the documented entry point. + +### 1.2 The dashboard contract is mostly stubs + +`sidequest-ui/src/types/watcher.ts` declares 11 `WatcherEventType` values. Production-code emission count: + +| Event type | Sites | Status | +|---|---|---| +| `agent_span_close` | auto | ✅ Every closed OTEL span fans out via `WatcherSpanProcessor` | +| `agent_span_open` | 1 | ⚠️ Only the handshake "hello" frame | +| `state_transition` | ~30 | ✅ Healthy — `session_handler.py` + `narration_apply.py` | +| `turn_complete` | **1** | ⚠️ Single emission; `TurnCompleteFields` mostly under-populated | +| `lore_retrieval` | 1 | ⚠️ One site | +| `prompt_assembled` | 1 | ⚠️ One site | +| `game_state_snapshot` | 2 | ⚠️ Two sites | +| `validation_warning` | **0** | ❌ Not emitted anywhere | +| `subsystem_exercise_summary` | **0** | ❌ Not emitted — kills the Subsystems tab | +| `coverage_gap` | **0** | ❌ Not emitted | +| `json_extraction_result` | **0** | ❌ Not emitted — extraction-tier lie detector is gone | + +The four `0` rows are the ADR-031 Layer-3 narrative-validation pipeline. It was never ported. There is no `TurnRecord`, no validator queue, no checks. + +### 1.3 ~80% of `spans.py` is dead constants + +`sidequest/telemetry/spans.py` defines roughly 80 `SPAN_*` constants (port-named after Rust source files). Of those, only ~14 helpers have any production call site. Specifically dead — constants exist, **no emission anywhere in production code**: + +- `SPAN_TURN` itself — the root turn span never opens. Every other span is therefore orphaned in the trace; the Timing tab cannot group spans by turn, the Subsystems tab cannot say "subsystems exercised this turn." +- All trope spans (`SPAN_TROPE_TICK`, `_ACTIVATE`, `_RESOLVE`, `_TICK_PER`, `_ROOM_TICK`, `_CROSS_SESSION`, `_EVALUATE_TRIGGERS`) +- All persistence spans (`_SAVE`, `_LOAD`, `_DELETE`) +- All chargen spans +- Most NPC/disposition/creature spans +- All state-patch spans (`SPAN_APPLY_WORLD_PATCH`, `SPAN_QUEST_UPDATE`, `SPAN_BUILD_PROTOCOL_DELTA`, `SPAN_COMPUTE_DELTA`) +- Inventory extraction, narrator/barrier, music, RAG, scenario, monster manual, reminders, pregen, catch-up, script tool, world materialization, merchant, content resolve, continuity validation, compose +- Most `SPAN_TURN_*` sub-spans +- Most `SPAN_ORCHESTRATOR_*` injection spans + +The Python catalog was *transcribed* from Rust, but the dispatch-path *emission sites* were never re-implanted into the Python dispatch tree. Large parts of the catalog are aspirational. + +### 1.4 The translator is impoverished + +`WatcherSpanProcessor.on_end` (`server/watcher.py:64-86`) flattens **every** closed OTEL span to `event_type: "agent_span_close"` with `fields: {name, duration_ms, ...attrs}`. There is no semantic-translation step that maps span families to typed events. In Rust, domain code emitted both spans and typed `tracing::info!`/`warn!` events; in Python, `publish_event(...)` exists but is rarely invoked from inside the dispatch path. The dashboard's typed-event tabs receive a flat firehose they cannot classify. + +### 1.5 Architectural framing + +The Python port copied the **vocabulary** (span name catalog) and the **transport** (`WatcherSpanProcessor`, hub, `/ws/watcher`), but not the **emission discipline** or **Layer-3 validator**. ADR-031 specifies a three-layer model — Transport, Agent, Narrative. The Python server has Layer 1, a fragmentary Layer 2, and no Layer 3 at all. + +--- + +## 2. Decision: Faithful Port of ADR-031, Full Parity + +Restore the OTEL dashboard to the three-layer semantic-telemetry contract specified in ADR-031, faithfully ported to Python. After this work: + +- Every subsystem the GM panel was designed to surface emits live signals. +- Every `WatcherEventType` declared in `watcher.ts` carries data. +- The catalog stops being aspirational. +- The translator owns typed-event routing for every span family with semantic content. +- The validator pipeline (ADR-031 Layer 3) exists as Python `asyncio` infrastructure. + +### 2.1 Three deliberate departures from the Rust ADR + +1. **`TurnRecord` shape.** Rust cloned two full `GameSnapshot`s per turn. Python stores `snapshot_before_hash + snapshot_after + StateDelta`. Same validation power, no double-clone cost. Rationale: Python copy semantics + GIL make full-snapshot doubling expensive; the hash supports "did anything change?" plus replay-keying, and the pre-snapshot is reconstructable from `snapshot_after - delta` if a future check needs it. +2. **Validator transport.** Rust used `tokio::sync::mpsc::channel(32)`. Python uses `asyncio.Queue(maxsize=32)`. Bounded; oldest-record drop on backpressure (faithful to original "lossy by design" intent). +3. **Console exporter dropped from `setup.py`.** No longer fits — `WatcherSpanProcessor` is the destination, console output is just noise. Gated behind `SIDEQUEST_OTEL_CONSOLE=1` for debug, default off. + +### 2.2 Out of scope (explicit) + +- No new dashboard panes — restoring data flow into existing tabs (Timeline, State, Subsystems, Timing, Console). +- No replay/persistence of `TurnRecord` (ADR-031 §"Consequences/Positive" mentions it as a future possibility; not building now). +- No Pennyfarthing-style HTTP OTLP receiver. Direct in-process span processor as today. +- No second-LLM validation. ADR-031's "God lifting rocks" prohibition stands — all checks are deterministic Python. + +--- + +## 3. Architecture + +``` +┌────────────────────────────────────────────────────────────────────────┐ +│ Layer 1 — Transport (FastAPI, /ws/watcher) unchanged │ +├────────────────────────────────────────────────────────────────────────┤ +│ Layer 2 — Agent Spans │ +│ • turn_span() opens at dispatch entry; every other span is its child │ +│ • Every dead SPAN_* in spans.py gets emission sites at the │ +│ documented module — ~80 sites total │ +│ • Helpers grow as needed │ +├────────────────────────────────────────────────────────────────────────┤ +│ Layer 3 — Narrative Validator (NEW in Python) │ +│ • TurnRecord dataclass assembled at end of dispatch │ +│ • Bounded asyncio.Queue → validator task │ +│ • Five checks: entity ref, inventory, patch legality, │ +│ trope-beat alignment, subsystem exercise │ +│ • Each check publishes one of: │ +│ validation_warning | subsystem_exercise_summary | │ +│ coverage_gap | turn_complete │ +│ • json_extraction_result is owned by the translator (§6), │ +│ not the validator — it's directly derivable from span attrs │ +├────────────────────────────────────────────────────────────────────────┤ +│ Translator (WatcherSpanProcessor) │ +│ • Routing table — span name → (event_type, component, extractor) │ +│ • Emits typed events on span close, IN ADDITION TO agent_span_close │ +│ • Single source of truth: SpanRoute entries colocated with constants │ +│ in spans.py; router map auto-built from imports │ +└────────────────────────────────────────────────────────────────────────┘ +``` + +--- + +## 4. Layer 2 — Span Emission Inventory + +Every dead family from `spans.py` gets emission helpers + call sites at the documented module. Target Python module paths verified against the current `sidequest-server` tree. + +### 4.1 Family table + +| Family | Constants | Target module(s) | Notes | +|---|---|---|---| +| **turn root** | `SPAN_TURN`, all `SPAN_TURN_*` sub-spans | `server/session_handler.py` (root open), `server/dispatch/*` for sub-spans | `turn_span()` opens at the dispatch entry. Every other span this section adds is its child. **Most load-bearing item — without it, traces are orphaned.** | +| **narrator** | `SPAN_NARRATOR_SEALED_ROUND` | `server/dispatch/barrier.py` (or wherever sealed-letter barrier lives post-port) | One per sealed-round resolution | +| **orchestrator** | `_NARRATOR_SESSION_RESET`, `_GENRE_IDENTITY_INJECTION`, `_TACTICAL_GRID_INJECTION`, `_TROPE_BEAT_INJECTION`, `_PARTY_PEER_INJECTION`, `_LORE_FILTER` | `agents/orchestrator.py` | One span per injection type | +| **agent LLM pipeline** | `SPAN_TURN_AGENT_LLM_PROMPT_BUILD`, `_PARSE_RESPONSE` (`_INFERENCE` already live) | `agents/orchestrator.py` | Wrap existing inference call | +| **content** | `SPAN_CONTENT_RESOLVE` | `genre/resolver*.py` | High volume — confirm sampling rate before shipping | +| **trope** | `SPAN_TROPE_TICK`, `_TICK_PER`, `_ROOM_TICK`, `_ACTIVATE`, `_RESOLVE`, `_CROSS_SESSION`, `_EVALUATE_TRIGGERS` | `game/trope*.py` + `agents/subsystems/troper*.py` | | +| **barrier** | `SPAN_BARRIER_ACTIVATED`, `_RESOLVED` | `game/barrier*.py` | | +| **music** | `SPAN_MUSIC_EVALUATE`, `_CLASSIFY_MOOD` | `audio/` | | +| **persistence** | `SPAN_PERSISTENCE_SAVE`, `_LOAD`, `_DELETE` | `game/persistence*.py` | | +| **chargen** | `SPAN_CHARGEN_STAT_ROLL`, `_STATS_GENERATED`, `_HP_FORMULA`, `_BACKSTORY_COMPOSED` | `game/builder*.py` | | +| **NPC** | `SPAN_NPC_REGISTRATION`, `_MERGE_PATCH` (`_AUTO_REGISTERED`, `_REINVENTED` already live) | `server/dispatch/npc_registry.py`, `game/npc*.py` | | +| **creature** | `SPAN_CREATURE_HP_DELTA` | `game/creature*.py` | | +| **disposition** | `SPAN_DISPOSITION_SHIFT` | `game/disposition*.py` | | +| **state patches** | `SPAN_APPLY_WORLD_PATCH`, `SPAN_QUEST_UPDATE`, `SPAN_BUILD_PROTOCOL_DELTA`, `SPAN_COMPUTE_DELTA` | `game/state*.py`, `game/delta*.py` | The "what mutated" record — ADR-031 explicitly cites this as the patch-legality input | +| **merchant** | `SPAN_MERCHANT_CONTEXT_INJECTED`, `_TRANSACTION` | `agents/orchestrator.py`, `game/state*.py` | | +| **inventory** | `SPAN_INVENTORY_EXTRACTION` | `agents/inventory_extractor*.py` | Drives `json_extraction_result` typed event | +| **continuity** | `SPAN_CONTINUITY_LLM_VALIDATION` | `agents/continuity_validator*.py` | | +| **compose** | `SPAN_COMPOSE` | `agents/context_builder*.py` | Wraps prompt-zone composition | +| **world** | `SPAN_WORLD_MATERIALIZED` | `agents/world_builder*.py` | | +| **RAG** | `SPAN_RAG_PROSE_CLEANUP` | `agents/orchestrator.py` (or wherever lore retrieval landed) | | +| **script tool** | `SPAN_SCRIPT_TOOL_PROMPT_INJECTED` | `agents/orchestrator.py` | | +| **reminders** | `SPAN_REMINDER_SPAWNED`, `_FIRED` | `server/dispatch/connect.py`, `server/app.py` | | +| **pregen** | `SPAN_PREGEN_SEED_MANUAL` | `server/dispatch/pregen*.py` | | +| **catch-up** | `SPAN_CATCH_UP_GENERATE` | `server/dispatch/catch_up*.py` | | +| **scenario** | `SPAN_SCENARIO_ADVANCE`, `_ACCUSATION` | `server/dispatch/*.py`, `server/dispatch/slash*.py` | | +| **monster manual** | `SPAN_MONSTER_MANUAL_INJECTED` | `server/dispatch/*.py` | | + +### 4.2 Implementation conventions + +1. **Helper-first.** If a span has a `xxx_span()` context manager in `spans.py`, use it. If not, add the helper before adding call sites — keeps attribute schemas consolidated. +2. **Parent context.** Every span opens *inside* the active turn root. Where a subsystem fires outside a turn (persistence on save, pregen on background warmup), it opens its own root span; never orphaned. +3. **Attribute discipline.** ADR-031 §"Layer 2" lists required attributes per span family — every helper enforces those (positional kwargs) and accepts `**attrs` for extras. No bare `start_as_current_span` calls in domain code. +4. **No silent skips.** Per CLAUDE.md *No Silent Fallbacks*, an emission helper must fire every time it's reached. No `if span_enabled` flag, no `try/except` swallow. + +--- + +## 5. Layer 3 — Narrative Validator Pipeline + +### 5.1 `TurnRecord` dataclass + +```python +@dataclass(frozen=True) +class PatchSummary: + patch_type: str # "world" | "combat" | "chase" | "scenario" + fields_changed: list[str] + +@dataclass(frozen=True) +class TurnRecord: + turn_id: int + timestamp: datetime + player_id: str + player_input: str + classified_intent: str # raw classification name + agent_name: str + narration: str + patches_applied: list[PatchSummary] + snapshot_before_hash: str # blake2b of pre-turn snapshot + snapshot_after: GameSnapshot # full post-turn state + delta: StateDelta # what changed (already computed) + beats_fired: list[tuple[str, float]] # (trope_name, threshold) + extraction_tier: int # 1, 2, or 3 + token_count_in: int + token_count_out: int + agent_duration_ms: int + is_degraded: bool +``` + +Lives at `sidequest/telemetry/turn_record.py`. Frozen dataclasses for immutability across the queue boundary. + +### 5.2 Pipeline + +``` +Dispatch (hot path) Validator task (cold path) +──────────────────────── ───────────────────────────── +session_handler.handle_action + │ + ├─ orchestrator.process_action() + │ (Layer-2 spans fire as today) + │ + ├─ patches applied, delta computed + │ + ├─ TurnRecord assembled + │ + ├─ validator_queue.put_nowait(record) ──── asyncio.Queue(32) ───► await queue.get() + │ (drops oldest on QueueFull — │ + │ log dropped_record_count via ├─ entity_check + │ watcher.health event) ├─ inventory_check + │ ├─ patch_legality_check + │ ▼ ├─ trope_alignment_check + │ WS broadcast to player ├─ subsystem_exercise_check + │ │ + │ └─ publish_event(...) + └─ next turn for each finding +``` + +**Lifecycle.** `validator_task` started by `app.py` at FastAPI startup, alongside the existing watcher hub binding. Single task, sequential processing. Validator allowed to lag — that's the point of the bounded queue. Validator never raises into hot path. Each check wraps in `try/except` that logs to `validation_warning` with `severity: "error"` and a check-name tag. On shutdown, the task drains with a 2s grace, then exits. + +### 5.3 The five checks + +Each runs against one `TurnRecord` and emits zero-or-more events via `publish_event`. Matches ADR-031's Rust catalog 1:1. + +| Check | Reads | Emits when | Event type | Severity | +|---|---|---|---|---| +| **entity_check** | `narration`, `snapshot_after.npc_registry`, `discovered_regions`, `inventory.items` | Narration mentions an NPC name / item / location not in snapshot | `validation_warning` | `warning` | +| **inventory_check** | `narration`, `delta.inventory_changes`, `patches_applied` | Narration says "you grab the X" but no patch added X / patch added Y but narration silent | `validation_warning` | `warning` | +| **patch_legality_check** | `patches_applied`, `snapshot_after` | HP > max, dead NPC acts, location transition from a region not adjacent in cartography graph, illegal stat mutation | `validation_warning` | `error` | +| **trope_alignment_check** | `beats_fired`, `narration` | A beat threshold crossed but narration's keyword set doesn't reflect it (uses the trope's own `keywords` list — no second LLM call) | `validation_warning` | `warning` | +| **subsystem_exercise_check** | sliding window of last N turns, agent invocation history | Agent type X (combat / merchant / world_builder / scenario) hasn't been invoked in N turns | `coverage_gap` (periodic) + `subsystem_exercise_summary` (per-turn rollup) | `info` | + +The validator also emits **`turn_complete`** as its first action upon dequeue, populating `TurnCompleteFields` from the assembled `TurnRecord`. The translator does not emit `turn_complete` from the span close — the validator has the full record, which is a strictly better source. + +`json_extraction_result` is **not** a validator concern — it's emitted directly by the translator from `SPAN_INVENTORY_EXTRACTION` / `SPAN_TURN_AGENT_LLM_PARSE_RESPONSE` / `SPAN_CONTINUITY_LLM_VALIDATION` close, where the tier and outcome are span attributes. See §6.4. + +### 5.4 Health & self-observation + +The hub already exposes `stats()` (subscriber count, published, dropped). The validator task adds: + +- `validator.queue_depth` — emitted every 30s as a `state_transition` with `component: "validator"` +- `validator.dropped_records` — incremented on `QueueFull`, surfaced in the heartbeat +- `validator.check_durations_ms` — per-check timing, p50/p99 over a sliding window + +If the validator crashes the task is logged loudly and the hub publishes a `validation_warning` with `severity: "error"` describing the death — per *No Silent Fallbacks*, the operator knows if the lie detector itself is offline. + +--- + +## 6. Translator Enrichment — Full Parity + +### 6.1 Principle + +Domain code's job is to **open a span with rich attributes**. The translator's job is to **emit the typed event(s)** on close. Direct `publish_event(...)` calls survive only for events that genuinely have no span (audio cue without state change, watcher self-health, etc.). + +This collapses the ~30 scattered `publish_event("state_transition", ...)` sites into a single routing table. + +### 6.2 Routing rule + +For each span closed by `on_end`: +1. **Always** emit `agent_span_close` (Timeline / Timing tabs depend on the flat firehose). +2. **Additionally**, if the span's `name` matches a known route, emit a typed event derived from the span's attributes. + +Augment, not replace. + +### 6.3 Single-source-of-truth mechanics + +```python +# spans.py — colocate the routing decision with the span constant +SPAN_DISPOSITION_SHIFT = "disposition.shift" +ROUTE_DISPOSITION_SHIFT = SpanRoute( + event_type="state_transition", + component="disposition", + extract=lambda span: { + "field": "disposition", + "npc": span.attributes.get("npc", ""), + "delta": int(span.attributes.get("delta", 0)), + }, +) +``` + +The router map in `watcher.py` is one auto-built dict from imports, not a parallel data source. Renaming a constant breaks at import; adding a span without a route is a lint check (Section 7). + +### 6.4 Full routing table + +#### → `state_transition` + +The bulk of the mapping. Every span that mutates persistent game state routes here, with `component` carrying the subsystem. + +| Span | component | +|---|---| +| `SPAN_APPLY_WORLD_PATCH` | `state.world` | +| `SPAN_QUEST_UPDATE` | `state.quest` | +| `SPAN_BUILD_PROTOCOL_DELTA` | `state.delta` | +| `SPAN_COMPUTE_DELTA` | `state.delta` | +| `SPAN_NPC_REGISTRATION` / `_AUTO_REGISTERED` / `_REINVENTED` / `_MERGE_PATCH` | `npc_registry` | +| `SPAN_DISPOSITION_SHIFT` | `disposition` | +| `SPAN_CREATURE_HP_DELTA` | `creature` | +| `SPAN_TROPE_TICK` / `_TICK_PER` / `_ROOM_TICK` / `_ACTIVATE` / `_RESOLVE` / `_CROSS_SESSION` / `_EVALUATE_TRIGGERS` | `trope` | +| `SPAN_BARRIER_ACTIVATED` / `_RESOLVED` | `barrier` | +| `SPAN_MERCHANT_TRANSACTION` / `_CONTEXT_INJECTED` | `merchant` | +| `SPAN_MUSIC_EVALUATE` / `_CLASSIFY_MOOD` | `audio` | +| `SPAN_PERSISTENCE_SAVE` / `_DELETE` | `persistence` | +| `SPAN_SCENARIO_ADVANCE` / `_ACCUSATION` | `scenario` | +| `SPAN_MONSTER_MANUAL_INJECTED` | `monster_manual` | +| `SPAN_REMINDER_SPAWNED` / `_FIRED` | `reminder` | +| `SPAN_PREGEN_SEED_MANUAL` | `pregen` | +| `SPAN_CATCH_UP_GENERATE` | `catch_up` | +| `SPAN_WORLD_MATERIALIZED` | `world_builder` | +| `SPAN_CHARGEN_STAT_ROLL` / `_STATS_GENERATED` / `_HP_FORMULA` / `_BACKSTORY_COMPOSED` | `chargen` | +| `SPAN_DICE_REQUEST_SENT` / `_THROW_RECEIVED` / `_RESULT_BROADCAST` | `dice` (events on span; router has dedicated event-handling branch) | +| `SPAN_MP_GAME_CREATED` / `_SLUG_CONNECT` / `_SEAT` / `_PLAYER_ACTION_PAUSED` | `multiplayer` | +| `SPAN_COMBAT_TICK` / `_ENDED` / `_PLAYER_DEAD` | `combat` | +| `SPAN_ENCOUNTER_PHASE_TRANSITION` / `_RESOLVED` / `_BEAT_APPLIED` / `_CONFRONTATION_INITIATED` / `_EMPTY_ACTOR_LIST` / `_BEAT_FAILURE_BRANCH` | `encounter` | +| `SPAN_ORCHESTRATOR_NARRATOR_SESSION_RESET` / `_GENRE_IDENTITY_INJECTION` / `_TACTICAL_GRID_INJECTION` / `_TROPE_BEAT_INJECTION` / `_PARTY_PEER_INJECTION` | `orchestrator` | +| `SPAN_SCRIPT_TOOL_PROMPT_INJECTED` | `script_tool` | +| `SPAN_NARRATOR_SEALED_ROUND` | `narrator` | +| `SPAN_LOCAL_DM_DECOMPOSE` / `_DISPATCH_BANK` / `_LETHALITY_ARBITRATE` | `local_dm` | +| `SPAN_PROJECTION_DECIDE` / `_CACHE_FILL` / `_CACHE_LAZY_FILL` | `projection` | + +#### → `prompt_assembled` + +| Span | +|---| +| `SPAN_COMPOSE` | +| `SPAN_TURN_AGENT_LLM_PROMPT_BUILD` | +| `SPAN_ORCHESTRATOR_PROCESS_ACTION` | + +#### → `lore_retrieval` + +| Span | +|---| +| `SPAN_RAG_PROSE_CLEANUP` | +| `SPAN_ORCHESTRATOR_LORE_FILTER` | + +#### → `json_extraction_result` + +| Span | +|---| +| `SPAN_INVENTORY_EXTRACTION` | +| `SPAN_TURN_AGENT_LLM_PARSE_RESPONSE` | +| `SPAN_CONTINUITY_LLM_VALIDATION` | + +#### → `subsystem_exercise_summary` + +| Span | +|---| +| `SPAN_LOCAL_DM_SUBSYSTEM` (also feeds the validator's coverage-gap check) | + +#### Stays flat (`agent_span_close` only) + +Timing data only, no semantic content. Listed in `FLAT_ONLY_SPANS`: + +- `SPAN_TURN` — validator owns `turn_complete` +- `SPAN_AGENT_CALL` / `_SESSION` — Claude subprocess timing +- `SPAN_TURN_AGENT_LLM_INFERENCE` — LLM call duration +- `SPAN_TURN_SYSTEM_TICK` / `_BEAT_CONTEXT` — per-tick parents whose effects propagate via deeper spans +- `SPAN_TURN_BARRIER` / `_STATE_UPDATE` / `_TROPES` / `_PHASE_TRANSITION` / `_MEDIA` / `_ASSEMBLE` / `_SLASH_COMMAND` / `_PREPROCESS_*` — structural sub-turn spans +- `SPAN_CONTENT_RESOLVE` — high-volume genre-pack lookup (read, not state mutation); routing every call would spam the dashboard. Lookup failures should fire `validation_warning` separately + +### 6.5 Severity inference + +- OTEL `Status.code == ERROR` → severity `error`. +- `json_extraction_result` with `tier > 1` → severity `warning` (Claude needed fallback). +- `validation_warning` from the validator carries its own severity. +- All others default `info`. + +### 6.6 Migration of existing `publish_event` sites + +A one-time audit, done as part of the implementation plan: + +``` +For each publish_event("state_transition", ..., component=X) site in +sidequest/server/{narration_apply,session_handler,session_helpers}.py: + 1. Identify the surrounding span (if any). + 2. If the site is INSIDE a span whose constant is in the routing table: + — Remove the publish_event call. + — Move any extra fields onto the span as attributes. + 3. If the site has NO surrounding span: + — Either add an emission helper (preferred), OR + — Keep the direct publish_event (sideband: watcher.health, + audio cue without state mutation, etc.). + — Document which one and why in the PR. +``` + +End state: every `state_transition` event the dashboard receives is either router-emitted from a span close or explicitly direct-emitted from a documented sideband site. + +### 6.7 Final ownership matrix + +| `WatcherEventType` | Owner | Mechanism | +|---|---|---| +| `agent_span_open` | watcher hub | WS handshake (existing) | +| `agent_span_close` | translator | every span close (existing, augmented) | +| `state_transition` | **translator** primary + sideband direct emits | router covers ~50 span families; direct emits only where span-less | +| `game_state_snapshot` | domain code | `session_handler.py` (existing) | +| `prompt_assembled` | translator | 3 span families | +| `lore_retrieval` | translator | 3 span families | +| `json_extraction_result` | translator | 3 span families | +| `subsystem_exercise_summary` | translator | `local_dm.subsystem` close | +| `coverage_gap` | validator | subsystem-exercise check (sliding window) | +| `validation_warning` | validator | 5 narrative checks | +| `turn_complete` | validator | one per `TurnRecord` | + +Every `WatcherEventType` has a clear owner. No orphans; no double-emission. + +--- + +## 7. Testing Strategy + +CLAUDE.md is non-negotiable: *"Every Test Suite Needs a Wiring Test."* Tests passing in isolation has been the failure mode of the dashboard regression itself. + +### 7.1 Three layers of test coverage + +#### Layer 1 — Unit: span helpers behave + +For every helper added or expanded in `spans.py`: +- Asserts the context manager opens a span with the named constant. +- Asserts every required positional kwarg becomes a span attribute. +- Asserts an extras `**attrs` dict merges in. +- Asserts a provider-local `_tracer` parameter is honored. + +Extends existing patterns in `tests/telemetry/test_spans.py`, `test_combat_encounter_spans.py`, `test_lethality_span.py`. + +#### Layer 2 — Translator: routing produces typed events + +In `tests/server/test_watcher_events.py` (extended): + +```python +@pytest.mark.parametrize("span_const,expected_event_type,expected_component", [ + (SPAN_DISPOSITION_SHIFT, "state_transition", "disposition"), + (SPAN_INVENTORY_EXTRACTION, "json_extraction_result", "sidequest-server"), + (SPAN_COMPOSE, "prompt_assembled", "sidequest-server"), + # ... one row per routed span +]) +def test_translator_emits_typed_event(span_const, expected_event_type, expected_component): + """Closing publishes a typed event AND agent_span_close.""" +``` + +One row per entry in the routing table. + +#### Layer 3 — Wiring: production code actually fires the spans + +For each subsystem family in §4.1, **one integration test** that: +1. Subscribes a fake WebSocket to `watcher_hub`. +2. Drives a representative production code path. +3. Asserts the expected typed event(s) land on the fake WS, with the expected `component` and shape. + +Example: + +```python +async def test_disposition_shift_emits_state_transition(server_fixture): + fake_ws = FakeWatcher() + await watcher_hub.subscribe(fake_ws) + + await server_fixture.dispatch_action( + player="alice", + text="I help the wounded merchant.", + ) + + events = fake_ws.events + assert any( + e["event_type"] == "state_transition" and e["component"] == "disposition" + for e in events + ), "disposition.shift span never reached the watcher hub" +``` + +~25 wiring tests, one per subsystem family. Medium-cost (boot a session) but the only protection against the failure mode that caused this entire spec. + +### 7.2 Static lint check — every span is routed or explicitly skipped + +A new `tests/telemetry/test_routing_completeness.py`: + +```python +def test_every_span_is_routed_or_explicitly_flat(): + """Fail if a SPAN_* constant lacks both a routing entry and a flat-only marker.""" + flat_only = set(FLAT_ONLY_SPANS) + routed = set(SPAN_ROUTES.keys()) + all_spans = {v for n, v in vars(spans).items() if n.startswith("SPAN_")} + missing = all_spans - flat_only - routed + assert not missing, f"Spans without routing decision: {missing}" +``` + +Runs on every CI job. A new span constant cannot land without a routing decision. + +### 7.3 Validator-pipeline tests + +In a new `tests/telemetry/test_validator_pipeline.py`: +- **Lifecycle:** validator starts on app startup, drains on shutdown, restarts cleanly under uvicorn `--reload`. +- **Backpressure:** queue full → oldest record drops, drop counter increments, hub publishes a `validation_warning` after N drops. +- **Crash containment:** make one check raise → other checks still run, validator task survives, `validation_warning` with `severity: "error"` describes the crash. +- **Per-check fixtures:** for each of the five checks, a `TurnRecord` fixture that *should* trigger the warning and one that shouldn't. + +### 7.4 P0 smoke test + +The `just otel` recipe gets a one-line CI check: + +```yaml +- name: just otel recipe smoke + run: timeout 5 just otel || [[ $? -eq 124 ]] + # exit 124 = timeout fired = recipe started successfully and is listening +``` + +Catches recipe-vs-script-name drift permanently. + +### 7.5 Out of scope + +- No replay tests (`TurnRecord` persistence not built). +- No GM panel UI tests for new tabs (existing UI tests assert on the dashboard contract; once data flows, they pass without modification). +- No load tests for the watcher hub (real playtests have ≤5 watchers; fan-out is `O(subscribers)`). + +--- + +## 8. Sequencing + +Stories framed for the sprint tracker. Calendar-day estimates suppressed in favor of PR counts at AI-era velocity. + +### Phase 0 — Stop the bleed (1 PR, blocks nothing) + +1. Fix `justfile` `otel` recipe → `playtest_dashboard.py`. +2. Drop `ConsoleSpanExporter` from `telemetry/setup.py` (gate behind `SIDEQUEST_OTEL_CONSOLE=1`). +3. Delete the stale "Phase 0 console exporter" docstring in `setup.py`. +4. CI smoke test for `just otel`. + +Lands first so playtests aren't blocked by the missing recipe. + +### Phase 1 — Translator routing infrastructure (1 PR, blocks Phase 2/3) + +1. Add `SpanRoute` dataclass and the `SPAN_ROUTES` dict mechanism to `spans.py`. +2. Refactor `WatcherSpanProcessor.on_end` to use the router (still augment-not-replace). +3. Add the routing-completeness lint test (`test_routing_completeness.py`). +4. Add `FLAT_ONLY_SPANS` set with current live spans listed (so the lint passes immediately). + +### Phase 2 — Layer-2 emission family PRs (~25 PRs, parallelizable) + +One PR per subsystem family from §4.1. Each PR contains: +- Helper additions/expansions in `spans.py`. +- `SpanRoute` entries (or `FLAT_ONLY_SPANS` additions for timing-only spans). +- Emission sites in the target Python module. +- Translator-routing test rows. +- One wiring integration test. +- Migration of any existing direct `publish_event` site that the new span subsumes. + +PRs are independent except: **`turn_span` lands first**. Mark the rest blocked-by `turn_span` until that ships. + +Order suggestion after `turn`: high-volume / high-mechanic-trust families first (state patches, NPC, trope, encounter). Cosmetic ones last (chargen, world_materialized). + +**Final call on PR-bundling deferred to the implementation plan.** Recommendation: bundle by adjacent module — fewer PRs, cleaner reviews, no inter-PR test interference. + +### Phase 3 — Layer-3 validator pipeline (1 PR) + +1. `TurnRecord` dataclass in `sidequest/telemetry/turn_record.py`. +2. `Validator` class with bounded `asyncio.Queue` and the five checks in `sidequest/telemetry/validator.py`. +3. Lifecycle wiring in `server/app.py`. +4. Health emissions (`validator.queue_depth`, `validator.dropped_records`, `validator.check_durations_ms`). +5. Test suite per §7.3. + +Independent of Phase 2's rollout — validator runs on whatever spans have been wired so far. + +### Phase 4 — Sweep & cleanup (1 PR) + +1. Audit final list of remaining direct `publish_event` sites; confirm each has a documented sideband rationale or is removed. +2. Update `sidequest-ui/src/types/watcher.ts` comment "Mirrors Rust WatcherEventType (sidequest-server/src/lib.rs)" → point at `sidequest-server/sidequest/telemetry/spans.py + server/watcher.py`. Pure docstring fix. +3. Mark ADR-031 `implementation-status: live` for real this time. + +--- + +## 9. Deliverables + +| File | Action | +|---|---| +| `docs/superpowers/specs/2026-04-25-otel-dashboard-restoration-design.md` | New (this spec) | +| `docs/adr/089-otel-dashboard-restoration.md` | New (ADR) | +| `docs/adr/031-game-watcher-semantic-telemetry.md` | Amend (Python-port section + status note) | +| `CLAUDE.md` ADR Index block | Regenerated via `scripts/regenerate_adr_indexes.py` | +| `justfile` | Phase 0: fix `otel` recipe path | +| `sidequest-server/sidequest/telemetry/setup.py` | Phase 0: drop ConsoleExporter; rewrite docstring | +| `sidequest-server/sidequest/telemetry/spans.py` | Phase 1: SpanRoute mechanism. Phase 2: ~25 helper additions, route entries | +| `sidequest-server/sidequest/server/watcher.py` | Phase 1: router-driven `on_end` | +| `sidequest-server/sidequest/telemetry/turn_record.py` | Phase 3: NEW | +| `sidequest-server/sidequest/telemetry/validator.py` | Phase 3: NEW | +| `sidequest-server/sidequest/server/app.py` | Phase 3: wire validator lifecycle | +| `sidequest-server/sidequest/{game,agents,server}/**/*.py` | Phase 2: emission sites at ~25 modules | +| `sidequest-server/tests/telemetry/test_routing_completeness.py` | Phase 1: NEW lint test | +| `sidequest-server/tests/telemetry/test_validator_pipeline.py` | Phase 3: NEW | +| `sidequest-server/tests/server/test_watcher_events.py` | Extended with translator parametrize rows | +| `sidequest-server/tests/integration/test_subsystem_wiring.py` (or per-family files) | Phase 2: ~25 wiring tests | + +## 10. ADR linkage + +- **ADR-089 (new):** *OTEL Dashboard Restoration after Python Port.* Status `accepted`. Documents the three-layer faithful-port decision, the deliberate departures, and the routing-table-as-single-source-of-truth pattern. Supersedes nothing; `related: [031, 058, 082]`. +- **ADR-031 amendment:** add a Python-port section noting the canonical implementation now lives in `sidequest/telemetry/spans.py` and `sidequest/server/watcher.py`. Strikethrough the Rust-specific phasing table. Re-affirm `implementation-status: live`. +- **CLAUDE.md ADR Index:** insert ADR-089 under "Telemetry" alongside ADR-031, ADR-058. Run `scripts/regenerate_adr_indexes.py`. + +## 11. What this design does NOT decide + +Deliberately deferred to the implementation plan (next skill, `superpowers:writing-plans`): +- Exact per-helper kwarg lists for the ~25 new emission helpers. +- Exact per-route field-extractor function bodies. +- Per-PR Jira-equivalent story IDs and titles. +- Whether Phase 2's ~25 PRs are bundled into mega-PRs by domain or stay one-per-span. + +--- + +*End of design.* From 61601c83dd2a3402f46e0aac12785283202c6dd3 Mon Sep 17 00:00:00 2001 From: Keith Avery Date: Sat, 25 Apr 2026 08:26:12 -0400 Subject: [PATCH 4/8] fix(justfile): point otel recipe at playtest_dashboard.py - Update `just otel` to call `playtest_dashboard.py` via `uv run` instead of the deleted `playtest.py --dashboard-only` - Add `__main__` entry point with argparse to `playtest_dashboard.py` (it was a library-only module with no CLI entry point) - Add `websockets>=12.0` and `rich>=13.0` to orchestrator `pyproject.toml` (required by playtest_dashboard.py, previously missing from manifest) --- justfile | 2 +- pyproject.toml | 2 + scripts/playtest_dashboard.py | 22 ++++++++ uv.lock | 97 +++++++++++++++++++++++++++++++++++ 4 files changed, 122 insertions(+), 1 deletion(-) diff --git a/justfile b/justfile index eac8cf3..e809bea 100644 --- a/justfile +++ b/justfile @@ -186,7 +186,7 @@ check-all: server-check client-lint client-test daemon-lint # OTEL dashboard — browser-friendly /ws/watcher viewer otel port="9765": - python3 {{root}}/scripts/playtest.py --dashboard-only --dashboard-port {{port}} + uv run python3 {{root}}/scripts/playtest_dashboard.py --dashboard-port {{port}} # Headless playtest driver (uses the running server) playtest *flags: diff --git a/pyproject.toml b/pyproject.toml index 6f363db..3f288ac 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -6,8 +6,10 @@ requires-python = ">=3.11" dependencies = [ "numpy>=1.24", "pyyaml>=6.0", + "rich>=13.0", "safetensors>=0.4", "torch>=2.0", + "websockets>=12.0", ] [project.optional-dependencies] diff --git a/scripts/playtest_dashboard.py b/scripts/playtest_dashboard.py index 22c70ed..ce91395 100644 --- a/scripts/playtest_dashboard.py +++ b/scripts/playtest_dashboard.py @@ -1018,3 +1018,25 @@ async def run_dashboard_server( console.print(f"[dim]Watcher proxy disconnected: {e} — reconnecting in 2s[/dim]") await asyncio.sleep(2) + +if __name__ == "__main__": + import argparse + + parser = argparse.ArgumentParser(description="SideQuest OTEL dashboard server.") + parser.add_argument("--dashboard-port", type=int, default=9765, + help="Port for the HTTP dashboard (default: 9765).") + parser.add_argument("--api-host", default="localhost", + help="Game server host (default: localhost).") + parser.add_argument("--api-port", type=int, default=8765, + help="Game server WebSocket port (default: 8765).") + parser.add_argument("--otlp-port", type=int, default=None, + help="OTLP receiver port (default: dashboard-port+2).") + args = parser.parse_args() + + asyncio.run(run_dashboard_server( + api_host=args.api_host, + api_port=args.api_port, + dashboard_port=args.dashboard_port, + otlp_port=args.otlp_port, + )) + diff --git a/uv.lock b/uv.lock index 6c3a2e3..88fb0c7 100644 --- a/uv.lock +++ b/uv.lock @@ -121,6 +121,18 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/62/a1/3d680cbfd5f4b8f15abc1d571870c5fc3e594bb582bc3b64ea099db13e56/jinja2-3.1.6-py3-none-any.whl", hash = "sha256:85ece4451f492d0c13c5dd7c13a64681a86afae63a5f347908daf103ce6d2f67", size = 134899, upload-time = "2025-03-05T20:05:00.369Z" }, ] +[[package]] +name = "markdown-it-py" +version = "4.0.0" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "mdurl" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/5b/f5/4ec618ed16cc4f8fb3b701563655a69816155e79e24a17b651541804721d/markdown_it_py-4.0.0.tar.gz", hash = "sha256:cb0a2b4aa34f932c007117b194e945bd74e0ec24133ceb5bac59009cda1cb9f3", size = 73070, upload-time = "2025-08-11T12:57:52.854Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/94/54/e7d793b573f298e1c9013b8c4dade17d481164aa517d1d7148619c2cedbf/markdown_it_py-4.0.0-py3-none-any.whl", hash = "sha256:87327c59b172c5011896038353a81343b6754500a08cd7a4973bb48c6d578147", size = 87321, upload-time = "2025-08-11T12:57:51.923Z" }, +] + [[package]] name = "markupsafe" version = "3.0.3" @@ -195,6 +207,15 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/70/bc/6f1c2f612465f5fa89b95bead1f44dcb607670fd42891d8fdcd5d039f4f4/markupsafe-3.0.3-cp314-cp314t-win_arm64.whl", hash = "sha256:32001d6a8fc98c8cb5c947787c5d08b0a50663d139f1305bac5885d98d9b40fa", size = 14146, upload-time = "2025-09-27T18:37:28.327Z" }, ] +[[package]] +name = "mdurl" +version = "0.1.2" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/d6/54/cfe61301667036ec958cb99bd3efefba235e65cdeb9c84d24a8293ba1d90/mdurl-0.1.2.tar.gz", hash = "sha256:bb413d29f5eea38f31dd4754dd7377d4465116fb207585f97bf925588687c1ba", size = 8729, upload-time = "2022-08-14T12:40:10.846Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/b3/38/89ba8ad64ae25be8de66a6d463314cf1eb366222074cfda9ee839c56a4b4/mdurl-0.1.2-py3-none-any.whl", hash = "sha256:84008a41e51615a49fc9966191ff91509e3c40b939176e643fd50a5c2196b8f8", size = 9979, upload-time = "2022-08-14T12:40:09.779Z" }, +] + [[package]] name = "mpmath" version = "1.3.0" @@ -539,6 +560,19 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/f1/12/de94a39c2ef588c7e6455cfbe7343d3b2dc9d6b6b2f40c4c6565744c873d/pyyaml-6.0.3-cp314-cp314t-win_arm64.whl", hash = "sha256:ebc55a14a21cb14062aa4162f906cd962b28e2e9ea38f9b4391244cd8de4ae0b", size = 149341, upload-time = "2025-09-25T21:32:56.828Z" }, ] +[[package]] +name = "rich" +version = "15.0.0" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "markdown-it-py" }, + { name = "pygments" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/c0/8f/0722ca900cc807c13a6a0c696dacf35430f72e0ec571c4275d2371fca3e9/rich-15.0.0.tar.gz", hash = "sha256:edd07a4824c6b40189fb7ac9bc4c52536e9780fbbfbddf6f1e2502c31b068c36", size = 230680, upload-time = "2026-04-12T08:24:00.75Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/82/3b/64d4899d73f91ba49a8c18a8ff3f0ea8f1c1d75481760df8c68ef5235bf5/rich-15.0.0-py3-none-any.whl", hash = "sha256:33bd4ef74232fb73fe9279a257718407f169c09b78a87ad3d296f548e27de0bb", size = 310654, upload-time = "2026-04-12T08:24:02.83Z" }, +] + [[package]] name = "safetensors" version = "0.7.0" @@ -577,8 +611,10 @@ source = { editable = "." } dependencies = [ { name = "numpy" }, { name = "pyyaml" }, + { name = "rich" }, { name = "safetensors" }, { name = "torch" }, + { name = "websockets" }, ] [package.optional-dependencies] @@ -591,8 +627,10 @@ requires-dist = [ { name = "numpy", specifier = ">=1.24" }, { name = "pytest", marker = "extra == 'dev'", specifier = ">=9.0.3" }, { name = "pyyaml", specifier = ">=6.0" }, + { name = "rich", specifier = ">=13.0" }, { name = "safetensors", specifier = ">=0.4" }, { name = "torch", specifier = ">=2.0" }, + { name = "websockets", specifier = ">=12.0" }, ] provides-extras = ["dev"] @@ -682,3 +720,62 @@ sdist = { url = "https://files.pythonhosted.org/packages/72/94/1a15dd82efb362ac8 wheels = [ { url = "https://files.pythonhosted.org/packages/18/67/36e9267722cc04a6b9f15c7f3441c2363321a3ea07da7ae0c0707beb2a9c/typing_extensions-4.15.0-py3-none-any.whl", hash = "sha256:f0fa19c6845758ab08074a0cfa8b7aecb71c999ca73d62883bc25cc018c4e548", size = 44614, upload-time = "2025-08-25T13:49:24.86Z" }, ] + +[[package]] +name = "websockets" +version = "16.0" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/04/24/4b2031d72e840ce4c1ccb255f693b15c334757fc50023e4db9537080b8c4/websockets-16.0.tar.gz", hash = "sha256:5f6261a5e56e8d5c42a4497b364ea24d94d9563e8fbd44e78ac40879c60179b5", size = 179346, upload-time = "2026-01-10T09:23:47.181Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/f2/db/de907251b4ff46ae804ad0409809504153b3f30984daf82a1d84a9875830/websockets-16.0-cp311-cp311-macosx_10_9_universal2.whl", hash = "sha256:31a52addea25187bde0797a97d6fc3d2f92b6f72a9370792d65a6e84615ac8a8", size = 177340, upload-time = "2026-01-10T09:22:34.539Z" }, + { url = "https://files.pythonhosted.org/packages/f3/fa/abe89019d8d8815c8781e90d697dec52523fb8ebe308bf11664e8de1877e/websockets-16.0-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:417b28978cdccab24f46400586d128366313e8a96312e4b9362a4af504f3bbad", size = 175022, upload-time = "2026-01-10T09:22:36.332Z" }, + { url = "https://files.pythonhosted.org/packages/58/5d/88ea17ed1ded2079358b40d31d48abe90a73c9e5819dbcde1606e991e2ad/websockets-16.0-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:af80d74d4edfa3cb9ed973a0a5ba2b2a549371f8a741e0800cb07becdd20f23d", size = 175319, upload-time = "2026-01-10T09:22:37.602Z" }, + { url = "https://files.pythonhosted.org/packages/d2/ae/0ee92b33087a33632f37a635e11e1d99d429d3d323329675a6022312aac2/websockets-16.0-cp311-cp311-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:08d7af67b64d29823fed316505a89b86705f2b7981c07848fb5e3ea3020c1abe", size = 184631, upload-time = "2026-01-10T09:22:38.789Z" }, + { url = "https://files.pythonhosted.org/packages/c8/c5/27178df583b6c5b31b29f526ba2da5e2f864ecc79c99dae630a85d68c304/websockets-16.0-cp311-cp311-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:7be95cfb0a4dae143eaed2bcba8ac23f4892d8971311f1b06f3c6b78952ee70b", size = 185870, upload-time = "2026-01-10T09:22:39.893Z" }, + { url = "https://files.pythonhosted.org/packages/87/05/536652aa84ddc1c018dbb7e2c4cbcd0db884580bf8e95aece7593fde526f/websockets-16.0-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:d6297ce39ce5c2e6feb13c1a996a2ded3b6832155fcfc920265c76f24c7cceb5", size = 185361, upload-time = "2026-01-10T09:22:41.016Z" }, + { url = "https://files.pythonhosted.org/packages/6d/e2/d5332c90da12b1e01f06fb1b85c50cfc489783076547415bf9f0a659ec19/websockets-16.0-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:1c1b30e4f497b0b354057f3467f56244c603a79c0d1dafce1d16c283c25f6e64", size = 184615, upload-time = "2026-01-10T09:22:42.442Z" }, + { url = "https://files.pythonhosted.org/packages/77/fb/d3f9576691cae9253b51555f841bc6600bf0a983a461c79500ace5a5b364/websockets-16.0-cp311-cp311-win32.whl", hash = "sha256:5f451484aeb5cafee1ccf789b1b66f535409d038c56966d6101740c1614b86c6", size = 178246, upload-time = "2026-01-10T09:22:43.654Z" }, + { url = "https://files.pythonhosted.org/packages/54/67/eaff76b3dbaf18dcddabc3b8c1dba50b483761cccff67793897945b37408/websockets-16.0-cp311-cp311-win_amd64.whl", hash = "sha256:8d7f0659570eefb578dacde98e24fb60af35350193e4f56e11190787bee77dac", size = 178684, upload-time = "2026-01-10T09:22:44.941Z" }, + { url = "https://files.pythonhosted.org/packages/84/7b/bac442e6b96c9d25092695578dda82403c77936104b5682307bd4deb1ad4/websockets-16.0-cp312-cp312-macosx_10_13_universal2.whl", hash = "sha256:71c989cbf3254fbd5e84d3bff31e4da39c43f884e64f2551d14bb3c186230f00", size = 177365, upload-time = "2026-01-10T09:22:46.787Z" }, + { url = "https://files.pythonhosted.org/packages/b0/fe/136ccece61bd690d9c1f715baaeefd953bb2360134de73519d5df19d29ca/websockets-16.0-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:8b6e209ffee39ff1b6d0fa7bfef6de950c60dfb91b8fcead17da4ee539121a79", size = 175038, upload-time = "2026-01-10T09:22:47.999Z" }, + { url = "https://files.pythonhosted.org/packages/40/1e/9771421ac2286eaab95b8575b0cb701ae3663abf8b5e1f64f1fd90d0a673/websockets-16.0-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:86890e837d61574c92a97496d590968b23c2ef0aeb8a9bc9421d174cd378ae39", size = 175328, upload-time = "2026-01-10T09:22:49.809Z" }, + { url = "https://files.pythonhosted.org/packages/18/29/71729b4671f21e1eaa5d6573031ab810ad2936c8175f03f97f3ff164c802/websockets-16.0-cp312-cp312-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:9b5aca38b67492ef518a8ab76851862488a478602229112c4b0d58d63a7a4d5c", size = 184915, upload-time = "2026-01-10T09:22:51.071Z" }, + { url = "https://files.pythonhosted.org/packages/97/bb/21c36b7dbbafc85d2d480cd65df02a1dc93bf76d97147605a8e27ff9409d/websockets-16.0-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:e0334872c0a37b606418ac52f6ab9cfd17317ac26365f7f65e203e2d0d0d359f", size = 186152, upload-time = "2026-01-10T09:22:52.224Z" }, + { url = "https://files.pythonhosted.org/packages/4a/34/9bf8df0c0cf88fa7bfe36678dc7b02970c9a7d5e065a3099292db87b1be2/websockets-16.0-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:a0b31e0b424cc6b5a04b8838bbaec1688834b2383256688cf47eb97412531da1", size = 185583, upload-time = "2026-01-10T09:22:53.443Z" }, + { url = "https://files.pythonhosted.org/packages/47/88/4dd516068e1a3d6ab3c7c183288404cd424a9a02d585efbac226cb61ff2d/websockets-16.0-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:485c49116d0af10ac698623c513c1cc01c9446c058a4e61e3bf6c19dff7335a2", size = 184880, upload-time = "2026-01-10T09:22:55.033Z" }, + { url = "https://files.pythonhosted.org/packages/91/d6/7d4553ad4bf1c0421e1ebd4b18de5d9098383b5caa1d937b63df8d04b565/websockets-16.0-cp312-cp312-win32.whl", hash = "sha256:eaded469f5e5b7294e2bdca0ab06becb6756ea86894a47806456089298813c89", size = 178261, upload-time = "2026-01-10T09:22:56.251Z" }, + { url = "https://files.pythonhosted.org/packages/c3/f0/f3a17365441ed1c27f850a80b2bc680a0fa9505d733fe152fdf5e98c1c0b/websockets-16.0-cp312-cp312-win_amd64.whl", hash = "sha256:5569417dc80977fc8c2d43a86f78e0a5a22fee17565d78621b6bb264a115d4ea", size = 178693, upload-time = "2026-01-10T09:22:57.478Z" }, + { url = "https://files.pythonhosted.org/packages/cc/9c/baa8456050d1c1b08dd0ec7346026668cbc6f145ab4e314d707bb845bf0d/websockets-16.0-cp313-cp313-macosx_10_13_universal2.whl", hash = "sha256:878b336ac47938b474c8f982ac2f7266a540adc3fa4ad74ae96fea9823a02cc9", size = 177364, upload-time = "2026-01-10T09:22:59.333Z" }, + { url = "https://files.pythonhosted.org/packages/7e/0c/8811fc53e9bcff68fe7de2bcbe75116a8d959ac699a3200f4847a8925210/websockets-16.0-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:52a0fec0e6c8d9a784c2c78276a48a2bdf099e4ccc2a4cad53b27718dbfd0230", size = 175039, upload-time = "2026-01-10T09:23:01.171Z" }, + { url = "https://files.pythonhosted.org/packages/aa/82/39a5f910cb99ec0b59e482971238c845af9220d3ab9fa76dd9162cda9d62/websockets-16.0-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:e6578ed5b6981005df1860a56e3617f14a6c307e6a71b4fff8c48fdc50f3ed2c", size = 175323, upload-time = "2026-01-10T09:23:02.341Z" }, + { url = "https://files.pythonhosted.org/packages/bd/28/0a25ee5342eb5d5f297d992a77e56892ecb65e7854c7898fb7d35e9b33bd/websockets-16.0-cp313-cp313-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:95724e638f0f9c350bb1c2b0a7ad0e83d9cc0c9259f3ea94e40d7b02a2179ae5", size = 184975, upload-time = "2026-01-10T09:23:03.756Z" }, + { url = "https://files.pythonhosted.org/packages/f9/66/27ea52741752f5107c2e41fda05e8395a682a1e11c4e592a809a90c6a506/websockets-16.0-cp313-cp313-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:c0204dc62a89dc9d50d682412c10b3542d748260d743500a85c13cd1ee4bde82", size = 186203, upload-time = "2026-01-10T09:23:05.01Z" }, + { url = "https://files.pythonhosted.org/packages/37/e5/8e32857371406a757816a2b471939d51c463509be73fa538216ea52b792a/websockets-16.0-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:52ac480f44d32970d66763115edea932f1c5b1312de36df06d6b219f6741eed8", size = 185653, upload-time = "2026-01-10T09:23:06.301Z" }, + { url = "https://files.pythonhosted.org/packages/9b/67/f926bac29882894669368dc73f4da900fcdf47955d0a0185d60103df5737/websockets-16.0-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:6e5a82b677f8f6f59e8dfc34ec06ca6b5b48bc4fcda346acd093694cc2c24d8f", size = 184920, upload-time = "2026-01-10T09:23:07.492Z" }, + { url = "https://files.pythonhosted.org/packages/3c/a1/3d6ccdcd125b0a42a311bcd15a7f705d688f73b2a22d8cf1c0875d35d34a/websockets-16.0-cp313-cp313-win32.whl", hash = "sha256:abf050a199613f64c886ea10f38b47770a65154dc37181bfaff70c160f45315a", size = 178255, upload-time = "2026-01-10T09:23:09.245Z" }, + { url = "https://files.pythonhosted.org/packages/6b/ae/90366304d7c2ce80f9b826096a9e9048b4bb760e44d3b873bb272cba696b/websockets-16.0-cp313-cp313-win_amd64.whl", hash = "sha256:3425ac5cf448801335d6fdc7ae1eb22072055417a96cc6b31b3861f455fbc156", size = 178689, upload-time = "2026-01-10T09:23:10.483Z" }, + { url = "https://files.pythonhosted.org/packages/f3/1d/e88022630271f5bd349ed82417136281931e558d628dd52c4d8621b4a0b2/websockets-16.0-cp314-cp314-macosx_10_15_universal2.whl", hash = "sha256:8cc451a50f2aee53042ac52d2d053d08bf89bcb31ae799cb4487587661c038a0", size = 177406, upload-time = "2026-01-10T09:23:12.178Z" }, + { url = "https://files.pythonhosted.org/packages/f2/78/e63be1bf0724eeb4616efb1ae1c9044f7c3953b7957799abb5915bffd38e/websockets-16.0-cp314-cp314-macosx_10_15_x86_64.whl", hash = "sha256:daa3b6ff70a9241cf6c7fc9e949d41232d9d7d26fd3522b1ad2b4d62487e9904", size = 175085, upload-time = "2026-01-10T09:23:13.511Z" }, + { url = "https://files.pythonhosted.org/packages/bb/f4/d3c9220d818ee955ae390cf319a7c7a467beceb24f05ee7aaaa2414345ba/websockets-16.0-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:fd3cb4adb94a2a6e2b7c0d8d05cb94e6f1c81a0cf9dc2694fb65c7e8d94c42e4", size = 175328, upload-time = "2026-01-10T09:23:14.727Z" }, + { url = "https://files.pythonhosted.org/packages/63/bc/d3e208028de777087e6fb2b122051a6ff7bbcca0d6df9d9c2bf1dd869ae9/websockets-16.0-cp314-cp314-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:781caf5e8eee67f663126490c2f96f40906594cb86b408a703630f95550a8c3e", size = 185044, upload-time = "2026-01-10T09:23:15.939Z" }, + { url = "https://files.pythonhosted.org/packages/ad/6e/9a0927ac24bd33a0a9af834d89e0abc7cfd8e13bed17a86407a66773cc0e/websockets-16.0-cp314-cp314-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:caab51a72c51973ca21fa8a18bd8165e1a0183f1ac7066a182ff27107b71e1a4", size = 186279, upload-time = "2026-01-10T09:23:17.148Z" }, + { url = "https://files.pythonhosted.org/packages/b9/ca/bf1c68440d7a868180e11be653c85959502efd3a709323230314fda6e0b3/websockets-16.0-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:19c4dc84098e523fd63711e563077d39e90ec6702aff4b5d9e344a60cb3c0cb1", size = 185711, upload-time = "2026-01-10T09:23:18.372Z" }, + { url = "https://files.pythonhosted.org/packages/c4/f8/fdc34643a989561f217bb477cbc47a3a07212cbda91c0e4389c43c296ebf/websockets-16.0-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:a5e18a238a2b2249c9a9235466b90e96ae4795672598a58772dd806edc7ac6d3", size = 184982, upload-time = "2026-01-10T09:23:19.652Z" }, + { url = "https://files.pythonhosted.org/packages/dd/d1/574fa27e233764dbac9c52730d63fcf2823b16f0856b3329fc6268d6ae4f/websockets-16.0-cp314-cp314-win32.whl", hash = "sha256:a069d734c4a043182729edd3e9f247c3b2a4035415a9172fd0f1b71658a320a8", size = 177915, upload-time = "2026-01-10T09:23:21.458Z" }, + { url = "https://files.pythonhosted.org/packages/8a/f1/ae6b937bf3126b5134ce1f482365fde31a357c784ac51852978768b5eff4/websockets-16.0-cp314-cp314-win_amd64.whl", hash = "sha256:c0ee0e63f23914732c6d7e0cce24915c48f3f1512ec1d079ed01fc629dab269d", size = 178381, upload-time = "2026-01-10T09:23:22.715Z" }, + { url = "https://files.pythonhosted.org/packages/06/9b/f791d1db48403e1f0a27577a6beb37afae94254a8c6f08be4a23e4930bc0/websockets-16.0-cp314-cp314t-macosx_10_15_universal2.whl", hash = "sha256:a35539cacc3febb22b8f4d4a99cc79b104226a756aa7400adc722e83b0d03244", size = 177737, upload-time = "2026-01-10T09:23:24.523Z" }, + { url = "https://files.pythonhosted.org/packages/bd/40/53ad02341fa33b3ce489023f635367a4ac98b73570102ad2cdd770dacc9a/websockets-16.0-cp314-cp314t-macosx_10_15_x86_64.whl", hash = "sha256:b784ca5de850f4ce93ec85d3269d24d4c82f22b7212023c974c401d4980ebc5e", size = 175268, upload-time = "2026-01-10T09:23:25.781Z" }, + { url = "https://files.pythonhosted.org/packages/74/9b/6158d4e459b984f949dcbbb0c5d270154c7618e11c01029b9bbd1bb4c4f9/websockets-16.0-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:569d01a4e7fba956c5ae4fc988f0d4e187900f5497ce46339c996dbf24f17641", size = 175486, upload-time = "2026-01-10T09:23:27.033Z" }, + { url = "https://files.pythonhosted.org/packages/e5/2d/7583b30208b639c8090206f95073646c2c9ffd66f44df967981a64f849ad/websockets-16.0-cp314-cp314t-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:50f23cdd8343b984957e4077839841146f67a3d31ab0d00e6b824e74c5b2f6e8", size = 185331, upload-time = "2026-01-10T09:23:28.259Z" }, + { url = "https://files.pythonhosted.org/packages/45/b0/cce3784eb519b7b5ad680d14b9673a31ab8dcb7aad8b64d81709d2430aa8/websockets-16.0-cp314-cp314t-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:152284a83a00c59b759697b7f9e9cddf4e3c7861dd0d964b472b70f78f89e80e", size = 186501, upload-time = "2026-01-10T09:23:29.449Z" }, + { url = "https://files.pythonhosted.org/packages/19/60/b8ebe4c7e89fb5f6cdf080623c9d92789a53636950f7abacfc33fe2b3135/websockets-16.0-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:bc59589ab64b0022385f429b94697348a6a234e8ce22544e3681b2e9331b5944", size = 186062, upload-time = "2026-01-10T09:23:31.368Z" }, + { url = "https://files.pythonhosted.org/packages/88/a8/a080593f89b0138b6cba1b28f8df5673b5506f72879322288b031337c0b8/websockets-16.0-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:32da954ffa2814258030e5a57bc73a3635463238e797c7375dc8091327434206", size = 185356, upload-time = "2026-01-10T09:23:32.627Z" }, + { url = "https://files.pythonhosted.org/packages/c2/b6/b9afed2afadddaf5ebb2afa801abf4b0868f42f8539bfe4b071b5266c9fe/websockets-16.0-cp314-cp314t-win32.whl", hash = "sha256:5a4b4cc550cb665dd8a47f868c8d04c8230f857363ad3c9caf7a0c3bf8c61ca6", size = 178085, upload-time = "2026-01-10T09:23:33.816Z" }, + { url = "https://files.pythonhosted.org/packages/9f/3e/28135a24e384493fa804216b79a6a6759a38cc4ff59118787b9fb693df93/websockets-16.0-cp314-cp314t-win_amd64.whl", hash = "sha256:b14dc141ed6d2dde437cddb216004bcac6a1df0935d79656387bd41632ba0bbd", size = 178531, upload-time = "2026-01-10T09:23:35.016Z" }, + { url = "https://files.pythonhosted.org/packages/72/07/c98a68571dcf256e74f1f816b8cc5eae6eb2d3d5cfa44d37f801619d9166/websockets-16.0-pp311-pypy311_pp73-macosx_10_15_x86_64.whl", hash = "sha256:349f83cd6c9a415428ee1005cadb5c2c56f4389bc06a9af16103c3bc3dcc8b7d", size = 174947, upload-time = "2026-01-10T09:23:36.166Z" }, + { url = "https://files.pythonhosted.org/packages/7e/52/93e166a81e0305b33fe416338be92ae863563fe7bce446b0f687b9df5aea/websockets-16.0-pp311-pypy311_pp73-macosx_11_0_arm64.whl", hash = "sha256:4a1aba3340a8dca8db6eb5a7986157f52eb9e436b74813764241981ca4888f03", size = 175260, upload-time = "2026-01-10T09:23:37.409Z" }, + { url = "https://files.pythonhosted.org/packages/56/0c/2dbf513bafd24889d33de2ff0368190a0e69f37bcfa19009ef819fe4d507/websockets-16.0-pp311-pypy311_pp73-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:f4a32d1bd841d4bcbffdcb3d2ce50c09c3909fbead375ab28d0181af89fd04da", size = 176071, upload-time = "2026-01-10T09:23:39.158Z" }, + { url = "https://files.pythonhosted.org/packages/a5/8f/aea9c71cc92bf9b6cc0f7f70df8f0b420636b6c96ef4feee1e16f80f75dd/websockets-16.0-pp311-pypy311_pp73-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:0298d07ee155e2e9fda5be8a9042200dd2e3bb0b8a38482156576f863a9d457c", size = 176968, upload-time = "2026-01-10T09:23:41.031Z" }, + { url = "https://files.pythonhosted.org/packages/9a/3f/f70e03f40ffc9a30d817eef7da1be72ee4956ba8d7255c399a01b135902a/websockets-16.0-pp311-pypy311_pp73-win_amd64.whl", hash = "sha256:a653aea902e0324b52f1613332ddf50b00c06fdaf7e92624fbf8c77c78fa5767", size = 178735, upload-time = "2026-01-10T09:23:42.259Z" }, + { url = "https://files.pythonhosted.org/packages/6f/28/258ebab549c2bf3e64d2b0217b973467394a9cea8c42f70418ca2c5d0d2e/websockets-16.0-py3-none-any.whl", hash = "sha256:1637db62fad1dc833276dded54215f2c7fa46912301a24bd94d45d46a011ceec", size = 171598, upload-time = "2026-01-10T09:23:45.395Z" }, +] From b2a3fae70ec9e215cb431328ab7a5c6221b6ace0 Mon Sep 17 00:00:00 2001 From: Keith Avery Date: Sat, 25 Apr 2026 08:37:31 -0400 Subject: [PATCH 5/8] ci: smoke-test the just otel recipe to catch script renames Co-Authored-By: Claude Sonnet 4.6 --- .github/workflows/just-otel-smoke.yml | 38 +++++++++++++++++++++++++++ 1 file changed, 38 insertions(+) create mode 100644 .github/workflows/just-otel-smoke.yml diff --git a/.github/workflows/just-otel-smoke.yml b/.github/workflows/just-otel-smoke.yml new file mode 100644 index 0000000..251f9c7 --- /dev/null +++ b/.github/workflows/just-otel-smoke.yml @@ -0,0 +1,38 @@ +name: just otel smoke + +on: + pull_request: + paths: + - "justfile" + - "scripts/playtest_dashboard.py" + - ".github/workflows/just-otel-smoke.yml" + push: + branches: [main] + +jobs: + just-otel: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + with: + submodules: recursive + + - name: Install just + run: | + curl --proto '=https' --tlsv1.2 -sSf https://just.systems/install.sh \ + | bash -s -- --to /usr/local/bin + + - name: Install uv + run: curl -LsSf https://astral.sh/uv/install.sh | sh && echo "$HOME/.cargo/bin" >> $GITHUB_PATH + + - name: Install Python + uses: actions/setup-python@v5 + with: + python-version: "3.12" + + - name: Sync orchestrator deps + run: uv sync + + - name: Smoke-test `just otel` + run: timeout 5 just otel || [ $? -eq 124 ] + # exit 124 = timeout fired = recipe started and is listening From 6cda9210830d83a0b38e1c4849b8936dc78960da Mon Sep 17 00:00:00 2001 From: Keith Avery Date: Sat, 25 Apr 2026 09:23:42 -0400 Subject: [PATCH 6/8] docs(adr): ADR-090 OTEL Dashboard Restoration after Python Port Co-Authored-By: Claude Sonnet 4.6 --- CLAUDE.md | 3 + docs/adr/090-otel-dashboard-restoration.md | 104 +++++++++++++++++++++ docs/adr/README.md | 1 + 3 files changed, 108 insertions(+) create mode 100644 docs/adr/090-otel-dashboard-restoration.md diff --git a/CLAUDE.md b/CLAUDE.md index be40135..468c052 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -273,6 +273,9 @@ Rust code samples in pre-ADR-082 ADRs are historical; translation table in **Code Generation / Tooling (059, 069)** - **059 Monster Manual — Server-Side Pre-Generation via Game-State Injection** *(drift)* · 069 Scenario Fixtures — Pre-configured World States for Testing *(drift)* +**Observability (090)** +- 090 OTEL Dashboard Restoration after Python Port + **Codebase Decomposition (060, 061, 062, 063, 064, 068, 088)** - 060 Genre Models Decomposition — Split models.rs by Domain · 061 Lore Module Decomposition — Split lore.rs by Responsibility · 062 Server lib.rs Extraction — Route Groups, State, and Watcher Events · 063 Dispatch Handler Splitting — By Pipeline Stage · 064 Game Crate Domain Modules — Organize 69 Flat Files · 068 Magic Literal Extraction — Domain-Scoped Constants · **088 ADR Frontmatter Schema and Auto-Generated Indexes** diff --git a/docs/adr/090-otel-dashboard-restoration.md b/docs/adr/090-otel-dashboard-restoration.md new file mode 100644 index 0000000..f5b9e64 --- /dev/null +++ b/docs/adr/090-otel-dashboard-restoration.md @@ -0,0 +1,104 @@ +--- +id: 90 +title: "OTEL Dashboard Restoration after Python Port" +status: accepted +date: 2026-04-25 +deciders: ["Keith Avery"] +supersedes: [] +superseded-by: null +related: [31, 58, 82] +tags: [observability, project-lifecycle] +implementation-status: live +implementation-pointer: null +--- + +# ADR-090: OTEL Dashboard Restoration after Python Port + +## Status + +**Accepted** — 2026-04-25. + +## Context + +After the Rust → Python port (ADR-082), the OTEL dashboard at `/ws/watcher` +and the React `Dashboard/` panes degraded materially. The CLAUDE.md +"OTEL Observability Principle" was no longer enforced: the GM panel — the +"lie detector" Sebastien-the-mechanics-first-player and Keith-the-builder +both depend on — surfaced almost no live signal. + +A forensic audit found four failures: + +1. The `just otel` recipe pointed at a deleted `playtest.py`. +2. Most `WatcherEventType` values declared in `watcher.ts` had zero or one + emission sites in production code. +3. ~80% of `SPAN_*` constants in `telemetry/spans.py` were transcribed from + Rust but never re-implanted into Python dispatch — the catalog was + aspirational. +4. The translator (`WatcherSpanProcessor.on_end`) flattened every span to + `agent_span_close` with no semantic typed-event routing. + +The Python port copied the **vocabulary** and **transport** but not the +**emission discipline** or the **Layer-3 narrative validator**. + +## Decision + +Restore the dashboard to ADR-031's three-layer semantic-telemetry contract, +faithfully ported to Python, with three deliberate departures: + +1. **`TurnRecord` shape.** Store `snapshot_before_hash + snapshot_after + + StateDelta` rather than two full `GameSnapshot` clones. Same validation + power, no double-clone cost. +2. **Validator transport.** `asyncio.Queue(maxsize=32)` with oldest-record + drop on backpressure (faithful to ADR-031's "lossy by design" intent). +3. **Console exporter gating.** `ConsoleSpanExporter` defaults off; gated + behind `SIDEQUEST_OTEL_CONSOLE=1` for debug. + +The translator gains a routing table (`SPAN_ROUTES`) colocated with span +constants in `spans.py` so renaming a constant breaks the route at import +and a new constant without a routing decision trips the +`test_routing_completeness.py` lint. + +A new `Validator` task consumes `TurnRecord`s and runs five deterministic +checks: entity, inventory, patch-legality, trope-alignment, +subsystem-exercise. The validator owns `turn_complete`, `coverage_gap`, +and `validation_warning`. + +## Consequences + +### Positive + +- Every `WatcherEventType` declared in `watcher.ts` has a clear owner; + no orphans, no double-emission. +- Adding a new span constant requires an explicit routing decision — + catches the regression that caused this work. +- The "lie detector" property is restored: subsystem activity surfaces + on the dashboard whether or not the LLM mentions it. +- `just otel` is CI-protected against future script renames. + +### Negative + +- ~24 emission families still need re-implanting (Phase 2 follow-up + plans, one per family). The infrastructure now in place makes each + rollout a small, repeatable change. +- Validator runs on the same event loop as dispatch. Bounded queue + + lossy drop policy keeps it from impacting hot-path latency, but heavy + check overhead would still serialize behind dispatch. Acceptable for + current playtest scale (≤5 watchers, ≤1 turn/sec). + +### Out of scope + +- No `TurnRecord` persistence / replay. +- No second-LLM validation (ADR-031's "God lifting rocks" prohibition). +- No HTTP OTLP receiver. In-process span processor remains. + +## Implementation + +See `docs/superpowers/specs/2026-04-25-otel-dashboard-restoration-design.md` +for the design and `docs/superpowers/plans/2026-04-25-otel-dashboard-restoration.md` +for the task plan. + +## Related + +- ADR-031: Game Watcher — Semantic Telemetry (this ADR ports it to Python) +- ADR-058: Claude subprocess OTEL passthrough (unchanged) +- ADR-082: Port `sidequest-api` from Rust back to Python (this ADR closes one of its drift items) diff --git a/docs/adr/README.md b/docs/adr/README.md index 4f2c2e8..9cdecc5 100644 --- a/docs/adr/README.md +++ b/docs/adr/README.md @@ -193,6 +193,7 @@ Current backend reference documents: `docs/architecture.md`, `docs/tech-stack.md | ADR | Status | Impl | |-----|--------|------| | [ADR-058: Claude Subprocess OTEL Passthrough](058-claude-subprocess-otel-passthrough.md) | ◇ proposed | deferred | +| [ADR-090: OTEL Dashboard Restoration after Python Port](090-otel-dashboard-restoration.md) | ✓ accepted | live | ## Codebase Decomposition From 30aa1c3883b6df1325c8ae3bdbf4ce98cfd2fc7c Mon Sep 17 00:00:00 2001 From: Keith Avery Date: Sat, 25 Apr 2026 09:25:24 -0400 Subject: [PATCH 7/8] docs(adr): amend ADR-031 with Python-port section, regen indexes Add observability tag, replace stale Rust-only opening blockquote, and append Python-port note pointing at the canonical telemetry implementation files in sidequest-server. Co-Authored-By: Claude Sonnet 4.6 --- .../031-game-watcher-semantic-telemetry.md | 32 +++++++++++++++++-- 1 file changed, 30 insertions(+), 2 deletions(-) diff --git a/docs/adr/031-game-watcher-semantic-telemetry.md b/docs/adr/031-game-watcher-semantic-telemetry.md index 2c2e6dc..0ca754f 100644 --- a/docs/adr/031-game-watcher-semantic-telemetry.md +++ b/docs/adr/031-game-watcher-semantic-telemetry.md @@ -7,14 +7,16 @@ deciders: [Keith Avery] supersedes: [] superseded-by: null related: [] -tags: [genre-mechanics] +tags: [genre-mechanics, observability] implementation-status: live implementation-pointer: null --- # ADR-031: Game Watcher — Semantic Telemetry for AI Agent Observability -> New for Rust port. No Python equivalent — sq-2 uses ad-hoc logging. +> Originally specified for the Rust backend. Python port (ADR-082) preserved +> the architecture; ADR-090 documents the post-port restoration. This ADR's +> prose remains the canonical statement of the three-layer model. ## Context SideQuest has an LLM adjudicating an RPG. Unlike a deterministic game engine, Claude makes @@ -185,3 +187,29 @@ game messages. Events are JSON-serialized `tracing` output filtered to game-rele - ADR-018: Trope engine lifecycle - ADR-026: Client state mirror - ADR-027: Reactive state messaging + +--- + +## Python-port note (2026-04-25) + +After ADR-082 ported the backend from Rust to Python, the canonical +implementation lives in: + +- `sidequest-server/sidequest/telemetry/spans.py` — span name catalog, + `SpanRoute` mechanism, `SPAN_ROUTES`, `FLAT_ONLY_SPANS`, helper + context managers. +- `sidequest-server/sidequest/server/watcher.py` — `WatcherSpanProcessor` + translator (Layer 1 + typed-event routing). +- `sidequest-server/sidequest/telemetry/validator.py` — Layer-3 narrative + validator (`Validator` class, five checks: entity, inventory, + patch-legality, trope-alignment, subsystem-exercise). +- `sidequest-server/sidequest/telemetry/turn_record.py` — `TurnRecord` + dataclass (per-turn audit record submitted to the validator queue). + +Code references in this ADR pre-2026-04-19 point at the Rust tree archived +at https://github.com/slabgorb/sidequest-api. The Rust phasing table is +preserved as historical context but the active phase descriptions are +superseded by ADR-090. + +`implementation-status: live` is re-affirmed for the Python port as of +ADR-090's completion. From 31142b2afe04626f90d3e2087538c532128299a3 Mon Sep 17 00:00:00 2001 From: Keith Avery Date: Sat, 25 Apr 2026 09:57:59 -0400 Subject: [PATCH 8/8] docs(plan): OTEL dashboard restoration implementation plan MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Pairs with the design spec at docs/superpowers/specs/2026-04-25-otel-dashboard-restoration-design.md. This is the 25-task plan (Phase 0–4) executed by the OTEL feat branches across orchestrator, sidequest-server, and sidequest-ui. Co-Authored-By: Claude Opus 4.7 (1M context) --- .../2026-04-25-otel-dashboard-restoration.md | 3066 +++++++++++++++++ 1 file changed, 3066 insertions(+) create mode 100644 docs/superpowers/plans/2026-04-25-otel-dashboard-restoration.md diff --git a/docs/superpowers/plans/2026-04-25-otel-dashboard-restoration.md b/docs/superpowers/plans/2026-04-25-otel-dashboard-restoration.md new file mode 100644 index 0000000..c178a90 --- /dev/null +++ b/docs/superpowers/plans/2026-04-25-otel-dashboard-restoration.md @@ -0,0 +1,3066 @@ +# OTEL Dashboard Restoration Implementation Plan + +> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. + +**Goal:** Restore the GM panel "lie detector" by porting ADR-031's three-layer semantic-telemetry contract — fix the broken `just otel` recipe, install translator routing infrastructure, anchor every span under a `turn_span` root, and stand up the Layer-3 narrative-validator pipeline. + +**Architecture:** Layer 1 (FastAPI `/ws/watcher`) is unchanged. Layer 2 gets a load-bearing `turn_span()` context manager that opens at every dispatch entry — every other span becomes its child. The translator (`WatcherSpanProcessor.on_end`) gains a `SpanRoute`-driven routing table colocated with span constants in `spans.py` so closing a span emits both the existing `agent_span_close` AND a typed `WatcherEvent` (state_transition, prompt_assembled, lore_retrieval, json_extraction_result). Layer 3 is new: a `TurnRecord` dataclass assembled at dispatch end, queued onto a bounded `asyncio.Queue(32)`, and consumed by a single validator task that runs five deterministic checks and publishes `validation_warning` / `coverage_gap` / `subsystem_exercise_summary` / `turn_complete` events. + +**Tech Stack:** Python 3.12, FastAPI/uvicorn, OpenTelemetry SDK (`opentelemetry-sdk`), `asyncio.Queue`, `pytest` + `pytest-asyncio`, `dataclasses`, `blake2b` (hashlib). + +**Out of scope for this plan (deferred to follow-up plans, one per emission family):** Phase 2 emission rollouts for narrator, orchestrator, content, trope, barrier, music, persistence, chargen, NPC, creature, disposition, state-patches, merchant, inventory, continuity, compose, world, RAG, script-tool, reminders, pregen, catch-up, scenario, monster-manual. The `turn_span` is the only Phase 2 emission this plan installs because every other span must nest under it. + +--- + +## File Structure + +### Modify +- `justfile` — fix `otel` recipe to call `playtest_dashboard.py` instead of the deleted `playtest.py`. +- `sidequest-server/sidequest/telemetry/setup.py` — drop the unconditional `ConsoleSpanExporter`; gate behind `SIDEQUEST_OTEL_CONSOLE=1`. +- `sidequest-server/sidequest/telemetry/spans.py` — add `SpanRoute` dataclass, `FLAT_ONLY_SPANS` set, `SPAN_ROUTES` dict, `turn_span()` helper. +- `sidequest-server/sidequest/server/watcher.py` — refactor `WatcherSpanProcessor.on_end` to consult the router and emit typed events in addition to `agent_span_close`. +- `sidequest-server/sidequest/server/session_handler.py` — open `turn_span()` at dispatch entry; assemble `TurnRecord` and put it on the validator queue at dispatch end. +- `sidequest-server/sidequest/server/app.py` — start the validator task at FastAPI startup; drain on shutdown. +- `sidequest-ui/src/types/watcher.ts` — update header comment to point at the Python source files. +- `docs/adr/031-game-watcher-semantic-telemetry.md` — append a Python-port section; flip implementation-status notes. +- `CLAUDE.md` — regenerate ADR Index block via `scripts/regenerate_adr_indexes.py`. + +### Create +- `sidequest-server/sidequest/telemetry/turn_record.py` — `TurnRecord` and `PatchSummary` dataclasses. +- `sidequest-server/sidequest/telemetry/validator.py` — `Validator` class, the five checks, lifecycle hooks, health emissions. +- `sidequest-server/tests/telemetry/test_routing_completeness.py` — static lint that every `SPAN_*` constant is either routed or in `FLAT_ONLY_SPANS`. +- `sidequest-server/tests/telemetry/test_validator_pipeline.py` — lifecycle, backpressure, crash containment, per-check fixtures. +- `sidequest-server/tests/server/test_turn_span_wiring.py` — integration test that the dispatch entry opens `turn` as the root span. +- `docs/adr/089-otel-dashboard-restoration.md` — new ADR, status `accepted`, related `[031, 058, 082]`. +- `.github/workflows/just-otel-smoke.yml` (or extend existing CI workflow) — smoke test for the `just otel` recipe. + +### Test +- `sidequest-server/tests/server/test_watcher_events.py` — extend with parametrized translator-routing rows. +- `sidequest-server/tests/telemetry/test_spans.py` — extend with `turn_span()` helper assertions. + +**Decomposition rationale:** `turn_record.py` and `validator.py` are split because the dataclass is consumed by both the dispatch hot path and the validator cold path; keeping the queue-side logic separate from the data shape keeps each file under ~250 lines and avoids a circular import between `session_handler` and the validator module. The `SpanRoute` mechanism lives in `spans.py` (not a new file) because the spec requires colocation with the span constants — renaming a constant must break the route at import. + +--- + +## Phase 0 — Stop the Bleed + +### Task 1: Fix the `just otel` recipe + +**Files:** +- Modify: `justfile:188-190` + +- [ ] **Step 1: Inspect the current recipe** + +Run: `sed -n '186,192p' justfile` +Expected output includes `python3 {{root}}/scripts/playtest.py --dashboard-only --dashboard-port {{port}}` (the broken line). + +- [ ] **Step 2: Update the recipe to call the split-out script** + +Edit `justfile:188-190` to: + +```just +# OTEL dashboard — browser-friendly /ws/watcher viewer +otel port="9765": + python3 {{root}}/scripts/playtest_dashboard.py --dashboard-port {{port}} +``` + +(Drops the `--dashboard-only` flag, which was a `playtest.py` switch and is not a `playtest_dashboard.py` flag.) + +- [ ] **Step 3: Verify `playtest_dashboard.py` accepts `--dashboard-port`** + +Run: `python3 scripts/playtest_dashboard.py --help` +Expected: argparse output including `--dashboard-port`. If the flag is named differently (e.g. `--port`), update the recipe to match. + +- [ ] **Step 4: Smoke-run the recipe locally** + +Run: `timeout 3 just otel || [[ $? -eq 124 ]]; echo "exit=$?"` +Expected: `exit=0` (timeout fired = recipe started successfully and is listening). + +- [ ] **Step 5: Commit** + +```bash +git add justfile +git commit -m "fix(justfile): point otel recipe at playtest_dashboard.py" +``` + +--- + +### Task 2: Gate `ConsoleSpanExporter` behind an env var + +**Files:** +- Modify: `sidequest-server/sidequest/telemetry/setup.py` + +- [ ] **Step 1: Write the failing test** + +Create `sidequest-server/tests/telemetry/test_setup_gating.py`: + +```python +"""Tests that ConsoleSpanExporter is gated behind SIDEQUEST_OTEL_CONSOLE.""" + +from __future__ import annotations + +import os + +import pytest +from opentelemetry import trace +from opentelemetry.sdk.trace import TracerProvider +from opentelemetry.sdk.trace.export import ConsoleSpanExporter, SimpleSpanProcessor + + +def _has_console_exporter(provider: TracerProvider) -> bool: + active = getattr(provider, "_active_span_processor", None) + processors = getattr(active, "_span_processors", ()) if active else () + for proc in processors: + exporter = getattr(proc, "span_exporter", None) + if isinstance(exporter, ConsoleSpanExporter): + return True + return False + + +def _reset_tracer(): + """Force re-init by clearing the module-level _initialized flag.""" + from sidequest.telemetry import setup as setup_mod + + setup_mod._initialized = False + # Reset the global provider to a fresh SDK provider so the next init wins. + trace._TRACER_PROVIDER = None # type: ignore[attr-defined] + + +def test_console_exporter_off_by_default(monkeypatch: pytest.MonkeyPatch) -> None: + monkeypatch.delenv("SIDEQUEST_OTEL_CONSOLE", raising=False) + _reset_tracer() + + from sidequest.telemetry.setup import init_tracer + + init_tracer() + provider = trace.get_tracer_provider() + assert isinstance(provider, TracerProvider) + assert not _has_console_exporter(provider), ( + "ConsoleSpanExporter should be off when SIDEQUEST_OTEL_CONSOLE is unset" + ) + + +def test_console_exporter_on_when_env_set(monkeypatch: pytest.MonkeyPatch) -> None: + monkeypatch.setenv("SIDEQUEST_OTEL_CONSOLE", "1") + _reset_tracer() + + from sidequest.telemetry.setup import init_tracer + + init_tracer() + provider = trace.get_tracer_provider() + assert isinstance(provider, TracerProvider) + assert _has_console_exporter(provider), ( + "ConsoleSpanExporter should be enabled when SIDEQUEST_OTEL_CONSOLE=1" + ) +``` + +- [ ] **Step 2: Run test to verify it fails** + +Run: `cd sidequest-server && uv run pytest tests/telemetry/test_setup_gating.py -v` +Expected: FAIL — `test_console_exporter_off_by_default` fails because the current `setup.py` always installs the console exporter. + +- [ ] **Step 3: Implement the gating in `setup.py`** + +Replace the body of `sidequest-server/sidequest/telemetry/setup.py` with: + +```python +"""OpenTelemetry tracer setup for sidequest-server. + +The default destination for spans is the WatcherSpanProcessor (registered +in server/app.py). Console export is debug-only and gated behind +SIDEQUEST_OTEL_CONSOLE=1 so that normal runs don't pollute stdout with +span dumps. +""" + +from __future__ import annotations + +import os + +from opentelemetry import trace +from opentelemetry.sdk.resources import Resource +from opentelemetry.sdk.trace import TracerProvider +from opentelemetry.sdk.trace.export import ( + BatchSpanProcessor, + ConsoleSpanExporter, +) + +_initialized = False + + +def init_tracer(service_name: str = "sidequest-server") -> None: + """Initialize the global OpenTelemetry tracer provider. + + Idempotent — safe to call from tests and from app startup. + """ + global _initialized + if _initialized: + return + + resource = Resource.create({"service.name": service_name}) + provider = TracerProvider(resource=resource) + + if os.environ.get("SIDEQUEST_OTEL_CONSOLE") == "1": + provider.add_span_processor(BatchSpanProcessor(ConsoleSpanExporter())) + + trace.set_tracer_provider(provider) + + _initialized = True + + +def tracer() -> trace.Tracer: + """Return the sidequest-server tracer.""" + return trace.get_tracer("sidequest-server") +``` + +- [ ] **Step 4: Run tests to verify they pass** + +Run: `cd sidequest-server && uv run pytest tests/telemetry/test_setup_gating.py -v` +Expected: PASS — both tests green. + +Run: `cd sidequest-server && uv run pytest tests/telemetry/ -v` +Expected: PASS — no regressions in other telemetry tests. + +- [ ] **Step 5: Commit** + +```bash +cd sidequest-server +git add sidequest/telemetry/setup.py tests/telemetry/test_setup_gating.py +git commit -m "feat(telemetry): gate ConsoleSpanExporter behind SIDEQUEST_OTEL_CONSOLE" +``` + +--- + +### Task 3: Add CI smoke test for `just otel` + +**Files:** +- Create: `.github/workflows/just-otel-smoke.yml` (or extend `ci.yml` if a single workflow file exists) + +- [ ] **Step 1: Inspect existing CI layout** + +Run: `ls .github/workflows/ 2>&1` +Expected: list of YAML files. If `ci.yml` exists, prefer extending it; otherwise create a dedicated `just-otel-smoke.yml`. + +- [ ] **Step 2: Write the smoke job** + +If extending an existing workflow, add this job. If creating a new one, the full file: + +```yaml +name: just otel smoke + +on: + pull_request: + paths: + - "justfile" + - "scripts/playtest_dashboard.py" + - ".github/workflows/just-otel-smoke.yml" + push: + branches: [main] + +jobs: + just-otel: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + with: + submodules: recursive + + - name: Install just + run: | + curl --proto '=https' --tlsv1.2 -sSf https://just.systems/install.sh \ + | bash -s -- --to /usr/local/bin + + - name: Install Python + uses: actions/setup-python@v5 + with: + python-version: "3.12" + + - name: Install playtest deps + run: | + pip install -r scripts/requirements-playtest.txt 2>/dev/null || pip install websockets aiohttp + + - name: Smoke-test `just otel` + run: timeout 5 just otel || [ $? -eq 124 ] + # exit 124 = timeout fired = recipe started and is listening +``` + +- [ ] **Step 3: Verify the workflow file is well-formed** + +Run: `python3 -c "import yaml; yaml.safe_load(open('.github/workflows/just-otel-smoke.yml'))"` (or pipe through `yamllint` if available) +Expected: no parse errors. + +- [ ] **Step 4: Local dry-run of the smoke command** + +Run: `timeout 5 just otel || [ $? -eq 124 ] && echo OK` +Expected: `OK`. + +- [ ] **Step 5: Commit** + +```bash +git add .github/workflows/just-otel-smoke.yml +git commit -m "ci: smoke-test the just otel recipe to catch script renames" +``` + +--- + +## Phase 1 — Translator Routing Infrastructure + +### Task 4: Add `SpanRoute` dataclass and `FLAT_ONLY_SPANS` set + +**Files:** +- Modify: `sidequest-server/sidequest/telemetry/spans.py` (top of file, after imports) + +- [ ] **Step 1: Write the failing test** + +Add to `sidequest-server/tests/telemetry/test_spans.py`: + +```python +def test_span_route_dataclass_shape() -> None: + """SpanRoute carries event_type, component, and an attribute extractor.""" + from sidequest.telemetry.spans import SpanRoute + + route = SpanRoute( + event_type="state_transition", + component="disposition", + extract=lambda span: {"npc": "alice"}, + ) + assert route.event_type == "state_transition" + assert route.component == "disposition" + # The extractor takes a span-like object and returns a dict. + fake = type("FakeSpan", (), {"attributes": {}, "name": "x"})() + assert route.extract(fake) == {"npc": "alice"} + + +def test_flat_only_spans_is_a_set_of_strings() -> None: + """FLAT_ONLY_SPANS contains span name strings, not SpanRoute objects.""" + from sidequest.telemetry.spans import FLAT_ONLY_SPANS + + assert isinstance(FLAT_ONLY_SPANS, set) + for name in FLAT_ONLY_SPANS: + assert isinstance(name, str) +``` + +- [ ] **Step 2: Run test to verify it fails** + +Run: `cd sidequest-server && uv run pytest tests/telemetry/test_spans.py::test_span_route_dataclass_shape tests/telemetry/test_spans.py::test_flat_only_spans_is_a_set_of_strings -v` +Expected: FAIL — `SpanRoute` and `FLAT_ONLY_SPANS` don't exist yet. + +- [ ] **Step 3: Add `SpanRoute` and `FLAT_ONLY_SPANS` to `spans.py`** + +Add to the top of `sidequest-server/sidequest/telemetry/spans.py`, immediately after the existing imports: + +```python +from dataclasses import dataclass +from typing import Callable, Protocol + + +class _SpanLike(Protocol): + """Structural stand-in for opentelemetry.sdk.trace.ReadableSpan.""" + + name: str + attributes: dict[str, Any] | None + + +@dataclass(frozen=True) +class SpanRoute: + """Routing decision for a span family. + + The translator consults the SPAN_ROUTES dict keyed by span name. When a + span closes, if its name is in the dict, the matching SpanRoute is used + to emit a typed WatcherEvent IN ADDITION TO the always-on + agent_span_close fan-out. The extractor pulls the typed event's + `fields` from the span's attributes — span attributes are the single + source of truth for typed-event payloads. + """ + + event_type: str + component: str + extract: Callable[[_SpanLike], dict[str, Any]] + + +# Spans that intentionally have no typed-event route. Closing one of these +# emits agent_span_close only — they carry timing data but no semantic +# payload the dashboard needs to classify. Membership is a deliberate +# decision, enforced by tests/telemetry/test_routing_completeness.py. +FLAT_ONLY_SPANS: set[str] = set() + + +# Span name -> SpanRoute. Populated near each SPAN_* constant below so +# that renaming a constant breaks the route at import time, and a new +# constant without a routing decision trips the completeness lint test. +SPAN_ROUTES: dict[str, SpanRoute] = {} +``` + +- [ ] **Step 4: Run tests to verify they pass** + +Run: `cd sidequest-server && uv run pytest tests/telemetry/test_spans.py -v` +Expected: PASS. + +- [ ] **Step 5: Commit** + +```bash +cd sidequest-server +git add sidequest/telemetry/spans.py tests/telemetry/test_spans.py +git commit -m "feat(telemetry): add SpanRoute, SPAN_ROUTES, FLAT_ONLY_SPANS scaffolding" +``` + +--- + +### Task 5: Populate `SPAN_ROUTES` and `FLAT_ONLY_SPANS` for currently-emitted spans + +**Files:** +- Modify: `sidequest-server/sidequest/telemetry/spans.py` + +This task seeds the routing table with entries for spans that currently emit in production code, so the routing-completeness lint (Task 7) can pass on first run. Spans for the ~24 follow-up emission families will be added in their respective follow-up plans. + +- [ ] **Step 1: Identify currently-emitting spans** + +Run: `cd sidequest-server && grep -rn "start_as_current_span\|_span(" sidequest/ --include="*.py" | grep -v "telemetry/spans.py" | sed 's/:.*//' | sort -u` +Expected: list of files that open spans. Cross-reference against `SPAN_*` constants in `spans.py` to identify which constants have at least one emission site today. + +- [ ] **Step 2: Identify currently-live span constants** + +Currently-live spans (verified by the spec §1.2 emission-site count and grep above): + +- `SPAN_AGENT_CALL`, `SPAN_AGENT_CALL_SESSION` — agent.call timing +- `SPAN_TURN_AGENT_LLM_INFERENCE` — LLM inference timing +- `SPAN_NPC_AUTO_REGISTERED`, `SPAN_NPC_REINVENTED` — NPC dispatch (story 37-44) +- `SPAN_COMBAT_TICK`, `SPAN_COMBAT_ENDED`, `SPAN_COMBAT_PLAYER_DEAD` — combat dispatch (story 3.4) +- `SPAN_ENCOUNTER_*` family — encounter dispatch (story 3.4) +- `SPAN_LOCAL_DM_DECOMPOSE`, `SPAN_LOCAL_DM_DISPATCH_BANK`, `SPAN_LOCAL_DM_LETHALITY_ARBITRATE`, `SPAN_LOCAL_DM_SUBSYSTEM` +- `SPAN_DICE_REQUEST_SENT`, `SPAN_DICE_THROW_RECEIVED`, `SPAN_DICE_RESULT_BROADCAST` +- `SPAN_PROJECTION_DECIDE`, `SPAN_PROJECTION_CACHE_FILL`, `SPAN_PROJECTION_CACHE_LAZY_FILL` +- `SPAN_LETHALITY_*` (per `test_lethality_span.py`) + +If your grep reveals additional live spans, add them. + +- [ ] **Step 3: Add routes and flat-only entries inline next to each constant** + +For each currently-live span, add either a `SPAN_ROUTES[...] = SpanRoute(...)` entry or a `FLAT_ONLY_SPANS.add(...)` line immediately after the constant's declaration. Example placement pattern: + +```python +SPAN_DICE_REQUEST_SENT = "dice.request_sent" +SPAN_ROUTES[SPAN_DICE_REQUEST_SENT] = SpanRoute( + event_type="state_transition", + component="dice", + extract=lambda span: { + "field": "dice.request", + "request_id": (span.attributes or {}).get("request_id", ""), + "expression": (span.attributes or {}).get("expression", ""), + }, +) +``` + +For timing-only spans: + +```python +SPAN_AGENT_CALL = "agent.call" +FLAT_ONLY_SPANS.add(SPAN_AGENT_CALL) +``` + +Reference table for each currently-live span — apply per spec §6.4: + +| Constant | Decision | event_type | component | Notes | +|---|---|---|---|---| +| `SPAN_AGENT_CALL` | flat-only | — | — | Subprocess timing | +| `SPAN_AGENT_CALL_SESSION` | flat-only | — | — | Subprocess timing | +| `SPAN_TURN_AGENT_LLM_INFERENCE` | flat-only | — | — | LLM duration | +| `SPAN_NPC_AUTO_REGISTERED` | route | `state_transition` | `npc_registry` | Extract `npc_name`, `slug` | +| `SPAN_NPC_REINVENTED` | route | `state_transition` | `npc_registry` | Extract `npc_name`, `slug`, `reason` | +| `SPAN_COMBAT_TICK` | route | `state_transition` | `combat` | Extract `tick`, `actor_count` | +| `SPAN_COMBAT_ENDED` | route | `state_transition` | `combat` | Extract `outcome` | +| `SPAN_COMBAT_PLAYER_DEAD` | route | `state_transition` | `combat` | Extract `player_id` | +| `SPAN_ENCOUNTER_PHASE_TRANSITION` | route | `state_transition` | `encounter` | Extract `from_phase`, `to_phase` | +| `SPAN_ENCOUNTER_RESOLVED` | route | `state_transition` | `encounter` | Extract `outcome` | +| `SPAN_ENCOUNTER_BEAT_APPLIED` | route | `state_transition` | `encounter` | Extract `beat`, `actor` | +| `SPAN_ENCOUNTER_CONFRONTATION_INITIATED` | route | `state_transition` | `encounter` | Extract `confrontation_kind` | +| `SPAN_ENCOUNTER_EMPTY_ACTOR_LIST` | route | `state_transition` | `encounter` | Extract `phase` | +| `SPAN_ENCOUNTER_BEAT_FAILURE_BRANCH` | route | `state_transition` | `encounter` | Extract `beat`, `branch` | +| `SPAN_LOCAL_DM_DECOMPOSE` | route | `state_transition` | `local_dm` | Extract `intent`, `branch` | +| `SPAN_LOCAL_DM_DISPATCH_BANK` | route | `state_transition` | `local_dm` | Extract `bank`, `dispatched_to` | +| `SPAN_LOCAL_DM_LETHALITY_ARBITRATE` | route | `state_transition` | `local_dm` | Extract `verdict`, `inputs` | +| `SPAN_LOCAL_DM_SUBSYSTEM` | route | `subsystem_exercise_summary` | `local_dm` | Extract `subsystem`, `exercised` | +| `SPAN_DICE_REQUEST_SENT` | route | `state_transition` | `dice` | Extract `request_id`, `expression` | +| `SPAN_DICE_THROW_RECEIVED` | route | `state_transition` | `dice` | Extract `request_id`, `result` | +| `SPAN_DICE_RESULT_BROADCAST` | route | `state_transition` | `dice` | Extract `request_id`, `players` | +| `SPAN_PROJECTION_DECIDE` | route | `state_transition` | `projection` | Extract `player_id`, `decision` | +| `SPAN_PROJECTION_CACHE_FILL` | route | `state_transition` | `projection` | Extract `player_id`, `keys` | +| `SPAN_PROJECTION_CACHE_LAZY_FILL` | route | `state_transition` | `projection` | Extract `player_id`, `keys` | + +The `extract` lambdas always defensively coerce `span.attributes or {}` because `ReadableSpan.attributes` can be `None`. Match attribute names to whatever the existing emission sites set — open the file that calls `.start_as_current_span(SPAN_X, attributes={...})` and copy the keys verbatim. + +- [ ] **Step 4: Run all telemetry tests** + +Run: `cd sidequest-server && uv run pytest tests/telemetry/ -v` +Expected: PASS — no regressions. Routing-completeness test doesn't exist yet (Task 7), so this only verifies existing tests still pass. + +- [ ] **Step 5: Commit** + +```bash +cd sidequest-server +git add sidequest/telemetry/spans.py +git commit -m "feat(telemetry): seed SPAN_ROUTES and FLAT_ONLY_SPANS for live spans" +``` + +--- + +### Task 6: Refactor `WatcherSpanProcessor.on_end` to use the router + +**Files:** +- Modify: `sidequest-server/sidequest/server/watcher.py:64-86` + +- [ ] **Step 1: Write the failing test** + +Add to `sidequest-server/tests/server/test_watcher_events.py` (extend existing file): + +```python +import asyncio +from datetime import UTC, datetime +from unittest.mock import MagicMock + +import pytest +from opentelemetry.sdk.trace import ReadableSpan +from opentelemetry.trace import SpanContext, Status, StatusCode, TraceFlags + +from sidequest.server.watcher import WatcherSpanProcessor +from sidequest.telemetry import spans as spans_mod +from sidequest.telemetry.spans import SPAN_DICE_REQUEST_SENT +from sidequest.telemetry.watcher_hub import WatcherHub + + +def _fake_span( + name: str, + attributes: dict | None = None, + status_code: StatusCode = StatusCode.OK, +) -> ReadableSpan: + """Build a ReadableSpan stand-in for tests.""" + span = MagicMock(spec=ReadableSpan) + span.name = name + span.attributes = attributes or {} + span.start_time = 1_000_000_000 + span.end_time = 2_000_000_000 + span.status = MagicMock() + span.status.status_code = MagicMock() + span.status.status_code.name = "OK" if status_code == StatusCode.OK else "ERROR" + return span + + +class _CapturingSubscriber: + def __init__(self) -> None: + self.events: list[dict] = [] + + async def send_json(self, data: dict) -> None: + self.events.append(data) + + +@pytest.mark.asyncio +async def test_on_end_emits_agent_span_close_for_every_span() -> None: + """Backward-compat: every closed span still produces agent_span_close.""" + hub = WatcherHub() + hub.bind_loop(asyncio.get_running_loop()) + sub = _CapturingSubscriber() + await hub.subscribe(sub) + + processor = WatcherSpanProcessor(hub) + processor.on_end(_fake_span("some.untracked.span", {"a": 1})) + + # Allow the cross-thread coroutine hop to flush. + await asyncio.sleep(0.05) + + assert any(e["event_type"] == "agent_span_close" for e in sub.events) + + +@pytest.mark.asyncio +async def test_on_end_emits_typed_event_for_routed_span() -> None: + """When a span name is in SPAN_ROUTES, on_end ALSO emits the typed event.""" + hub = WatcherHub() + hub.bind_loop(asyncio.get_running_loop()) + sub = _CapturingSubscriber() + await hub.subscribe(sub) + + processor = WatcherSpanProcessor(hub) + processor.on_end(_fake_span( + SPAN_DICE_REQUEST_SENT, + {"request_id": "r1", "expression": "1d20"}, + )) + await asyncio.sleep(0.05) + + typed = [e for e in sub.events if e["event_type"] == "state_transition"] + flat = [e for e in sub.events if e["event_type"] == "agent_span_close"] + assert typed, "Routed span did not produce a typed state_transition event" + assert flat, "Routed span must STILL produce agent_span_close (augment, not replace)" + assert typed[0]["component"] == "dice" + assert typed[0]["fields"]["request_id"] == "r1" +``` + +- [ ] **Step 2: Run test to verify it fails** + +Run: `cd sidequest-server && uv run pytest tests/server/test_watcher_events.py::test_on_end_emits_typed_event_for_routed_span -v` +Expected: FAIL — current `on_end` only emits `agent_span_close`. + +- [ ] **Step 3: Refactor `on_end`** + +Replace `WatcherSpanProcessor.on_end` in `sidequest-server/sidequest/server/watcher.py` with: + +```python +def on_end(self, span: ReadableSpan) -> None: + end_ns = span.end_time or 0 + start_ns = span.start_time or end_ns + duration_ms = max(0, (end_ns - start_ns) // 1_000_000) + attrs: dict[str, Any] = {} + if span.attributes: + for k, v in span.attributes.items(): + attrs[str(k)] = v + + severity = "info" + if span.status is not None and span.status.status_code.name == "ERROR": + severity = "error" + + # Always emit the flat firehose event — Timeline / Timing tabs depend on it. + self._hub.publish( + { + "timestamp": datetime.now(UTC).isoformat(), + "component": "sidequest-server", + "event_type": "agent_span_close", + "severity": severity, + "fields": { + "name": span.name, + "duration_ms": duration_ms, + **attrs, + }, + } + ) + + # Then, if the span has a routing decision, emit the typed event too. + from sidequest.telemetry.spans import SPAN_ROUTES + + route = SPAN_ROUTES.get(span.name) + if route is None: + return + + try: + fields = route.extract(span) + except Exception as exc: # noqa: BLE001 + # Per CLAUDE.md: no silent fallbacks. Surface the failure on the bus + # so the operator sees that the translator is broken, not silently + # missing typed events. + logger.exception("watcher.route_extract_failed span=%s", span.name) + self._hub.publish( + { + "timestamp": datetime.now(UTC).isoformat(), + "component": "watcher", + "event_type": "validation_warning", + "severity": "error", + "fields": { + "check": "route_extract", + "span": span.name, + "error": str(exc), + }, + } + ) + return + + # Inferred severity per spec §6.5. + typed_severity = severity + if route.event_type == "json_extraction_result": + tier = fields.get("tier") + if isinstance(tier, int) and tier > 1: + typed_severity = "warning" + + self._hub.publish( + { + "timestamp": datetime.now(UTC).isoformat(), + "component": route.component, + "event_type": route.event_type, + "severity": typed_severity, + "fields": fields, + } + ) +``` + +- [ ] **Step 4: Run tests to verify they pass** + +Run: `cd sidequest-server && uv run pytest tests/server/test_watcher_events.py -v` +Expected: PASS — both new tests green; existing tests still green. + +Run: `cd sidequest-server && uv run pytest tests/server/ -v -x` +Expected: PASS — no regressions. + +- [ ] **Step 5: Commit** + +```bash +cd sidequest-server +git add sidequest/server/watcher.py tests/server/test_watcher_events.py +git commit -m "feat(watcher): translator emits typed events via SPAN_ROUTES on close" +``` + +--- + +### Task 7: Add the routing-completeness lint test + +**Files:** +- Create: `sidequest-server/tests/telemetry/test_routing_completeness.py` + +- [ ] **Step 1: Write the test** + +Create `sidequest-server/tests/telemetry/test_routing_completeness.py`: + +```python +"""Static lint: every SPAN_* constant must have a routing decision. + +A new span constant added to spans.py without either an entry in +SPAN_ROUTES or membership in FLAT_ONLY_SPANS is a routing gap — the +translator will emit only agent_span_close, and the dashboard's typed +tabs will silently miss the new subsystem. This test forces the +decision to be explicit at the point a constant is introduced. +""" + +from __future__ import annotations + +from sidequest.telemetry import spans +from sidequest.telemetry.spans import FLAT_ONLY_SPANS, SPAN_ROUTES + + +def _all_span_constants() -> set[str]: + """Every SPAN_* attribute on the spans module that holds a string.""" + return { + v + for name, v in vars(spans).items() + if name.startswith("SPAN_") and isinstance(v, str) + } + + +def test_every_span_is_routed_or_explicitly_flat() -> None: + all_spans = _all_span_constants() + routed = set(SPAN_ROUTES.keys()) + flat = set(FLAT_ONLY_SPANS) + missing = all_spans - routed - flat + overlap = routed & flat + + assert not overlap, ( + f"Spans cannot be both routed AND flat-only: {sorted(overlap)}" + ) + assert not missing, ( + "Spans without a routing decision (add to SPAN_ROUTES or " + f"FLAT_ONLY_SPANS): {sorted(missing)}" + ) + + +def test_routes_target_known_event_types() -> None: + """Each SpanRoute.event_type matches a WatcherEventType the dashboard + handles. This is a string check — the source of truth is + sidequest-ui/src/types/watcher.ts.""" + known = { + "agent_span_open", + "agent_span_close", + "state_transition", + "turn_complete", + "lore_retrieval", + "prompt_assembled", + "game_state_snapshot", + "validation_warning", + "subsystem_exercise_summary", + "coverage_gap", + "json_extraction_result", + } + bad = [ + (name, route.event_type) + for name, route in SPAN_ROUTES.items() + if route.event_type not in known + ] + assert not bad, f"Routes targeting unknown event types: {bad}" +``` + +- [ ] **Step 2: Run test to verify it passes** + +Run: `cd sidequest-server && uv run pytest tests/telemetry/test_routing_completeness.py -v` +Expected: PASS — every currently-live `SPAN_*` constant has a decision from Task 5. + +If FAIL: the failure message lists missing constants; go back to Task 5 and add them to either `SPAN_ROUTES` or `FLAT_ONLY_SPANS`. + +- [ ] **Step 3: Verify it actually catches missing routes** + +Sanity check: temporarily add `SPAN_TEST_SENTINEL = "test.sentinel"` to `spans.py`, re-run: + +Run: `cd sidequest-server && uv run pytest tests/telemetry/test_routing_completeness.py -v` +Expected: FAIL — sentinel listed in missing. + +Remove the sentinel and re-run: +Expected: PASS. + +- [ ] **Step 4: No-op (lint test only)** + +This task has no implementation step — the test IS the deliverable. + +- [ ] **Step 5: Commit** + +```bash +cd sidequest-server +git add tests/telemetry/test_routing_completeness.py +git commit -m "test(telemetry): lint every SPAN_* constant has a routing decision" +``` + +--- + +## Phase 2 — Turn Root Span (Load-Bearing) + +This is the only Phase 2 emission included in this plan. Every other emission family from spec §4.1 is a follow-up plan. + +### Task 8: Add `turn_span()` context-manager helper + +**Files:** +- Modify: `sidequest-server/sidequest/telemetry/spans.py` + +- [ ] **Step 1: Write the failing test** + +Add to `sidequest-server/tests/telemetry/test_spans.py`: + +```python +def test_turn_span_opens_named_span_with_required_attrs() -> None: + """turn_span() yields a span named 'turn' with required attributes set.""" + from opentelemetry import trace + from opentelemetry.sdk.trace import ReadableSpan, TracerProvider + from opentelemetry.sdk.trace.export import ( + BatchSpanProcessor, + SimpleSpanProcessor, + ) + from opentelemetry.sdk.trace.export.in_memory_span_exporter import ( + InMemorySpanExporter, + ) + + from sidequest.telemetry.spans import SPAN_TURN, turn_span + + exporter = InMemorySpanExporter() + provider = TracerProvider() + provider.add_span_processor(SimpleSpanProcessor(exporter)) + trace.set_tracer_provider(provider) + + with turn_span( + turn_id=42, + player_id="alice", + agent_name="narrator", + ): + pass + + spans = exporter.get_finished_spans() + assert len(spans) == 1 + assert spans[0].name == SPAN_TURN + attrs = dict(spans[0].attributes or {}) + assert attrs["turn_id"] == 42 + assert attrs["player_id"] == "alice" + assert attrs["agent_name"] == "narrator" + + +def test_turn_span_accepts_extra_attrs() -> None: + from opentelemetry import trace + from opentelemetry.sdk.trace import TracerProvider + from opentelemetry.sdk.trace.export import SimpleSpanProcessor + from opentelemetry.sdk.trace.export.in_memory_span_exporter import ( + InMemorySpanExporter, + ) + + from sidequest.telemetry.spans import turn_span + + exporter = InMemorySpanExporter() + provider = TracerProvider() + provider.add_span_processor(SimpleSpanProcessor(exporter)) + trace.set_tracer_provider(provider) + + with turn_span( + turn_id=1, + player_id="bob", + agent_name="narrator", + room_id="room-7", + extraction_tier=1, + ): + pass + + attrs = dict(exporter.get_finished_spans()[0].attributes or {}) + assert attrs["room_id"] == "room-7" + assert attrs["extraction_tier"] == 1 +``` + +- [ ] **Step 2: Run test to verify it fails** + +Run: `cd sidequest-server && uv run pytest tests/telemetry/test_spans.py::test_turn_span_opens_named_span_with_required_attrs -v` +Expected: FAIL — `turn_span` does not exist in `spans.py`. + +- [ ] **Step 3: Add the helper to `spans.py`** + +In `sidequest-server/sidequest/telemetry/spans.py`, locate the section starting `# Turn — sidequest-server/dispatch/mod.rs, dispatch/tropes.rs` and immediately after the `SPAN_TURN_*` constants, add: + +```python +@contextmanager +def turn_span( + *, + turn_id: int, + player_id: str, + agent_name: str, + **attrs: Any, +) -> Iterator[trace.Span]: + """Open the root `turn` span for a dispatch. + + Every other span opened during this dispatch becomes a child of this + span. Without it, traces are orphaned — the Timing tab cannot group by + turn and the Subsystems tab cannot derive per-turn exercise summaries. + + Required attributes match ADR-031 §"Layer 2" turn-root contract: + turn_id, player_id, agent_name. Extras are accepted via **attrs and + set on the span verbatim. + """ + with tracer().start_as_current_span(SPAN_TURN) as span: + span.set_attribute("turn_id", turn_id) + span.set_attribute("player_id", player_id) + span.set_attribute("agent_name", agent_name) + for k, v in attrs.items(): + span.set_attribute(k, v) + yield span +``` + +Also add the routing decision below the existing `SPAN_TURN` constant (the validator owns `turn_complete`, so the root span itself is flat-only): + +```python +FLAT_ONLY_SPANS.add(SPAN_TURN) +``` + +- [ ] **Step 4: Run tests to verify they pass** + +Run: `cd sidequest-server && uv run pytest tests/telemetry/test_spans.py -v` +Expected: PASS. + +Run: `cd sidequest-server && uv run pytest tests/telemetry/test_routing_completeness.py -v` +Expected: PASS — `SPAN_TURN` is now in `FLAT_ONLY_SPANS`. + +- [ ] **Step 5: Commit** + +```bash +cd sidequest-server +git add sidequest/telemetry/spans.py tests/telemetry/test_spans.py +git commit -m "feat(telemetry): add turn_span() root context manager" +``` + +--- + +### Task 9: Open `turn_span()` at the dispatch entry + +**Files:** +- Modify: `sidequest-server/sidequest/server/session_handler.py` + +- [ ] **Step 1: Identify the dispatch entry point** + +Run: `cd sidequest-server && grep -n "async def \|def handle_action\|def dispatch_action\|class WebSocketSessionHandler" sidequest/server/session_handler.py | head -30` +Expected: locate the top-level coroutine that processes a player action — the function that fires once per turn from the WebSocket route. The spec calls it `session_handler.handle_action`; verify the actual name. + +- [ ] **Step 2: Write the wiring test** + +Create `sidequest-server/tests/server/test_turn_span_wiring.py`: + +```python +"""Wiring test: dispatching an action opens the turn root span.""" + +from __future__ import annotations + +import asyncio + +import pytest +from opentelemetry import trace +from opentelemetry.sdk.trace import TracerProvider +from opentelemetry.sdk.trace.export import SimpleSpanProcessor +from opentelemetry.sdk.trace.export.in_memory_span_exporter import ( + InMemorySpanExporter, +) + + +@pytest.fixture +def in_memory_exporter(): + exporter = InMemorySpanExporter() + provider = TracerProvider() + provider.add_span_processor(SimpleSpanProcessor(exporter)) + trace.set_tracer_provider(provider) + # Force re-init so the span helpers pick up the new provider. + from sidequest.telemetry import setup as setup_mod + + setup_mod._initialized = True + yield exporter + exporter.clear() + + +@pytest.mark.asyncio +async def test_dispatch_opens_turn_span(in_memory_exporter, server_fixture) -> None: + """A turn dispatch produces at least one span named 'turn'. + + Sister spans opened during the dispatch must appear as children — i.e. + the parent_span_id of every non-turn span equals the span_id of a 'turn' + span. This is the load-bearing invariant for the Timing tab's + "subsystems exercised this turn" grouping. + """ + await server_fixture.dispatch_action( + player="alice", + text="I look around.", + ) + + spans = in_memory_exporter.get_finished_spans() + turn_spans = [s for s in spans if s.name == "turn"] + assert turn_spans, "No 'turn' span opened during dispatch" + + turn_span_ids = {s.context.span_id for s in turn_spans} + non_turn = [s for s in spans if s.name != "turn"] + if non_turn: + # Every non-turn span recorded during this dispatch should chain + # up to one of the turn span IDs (parent or ancestor). + roots = {s for s in non_turn if s.parent is None} + assert not roots, ( + f"Non-turn spans without a turn parent (orphans): " + f"{[r.name for r in roots]}" + ) +``` + +The `server_fixture` is the existing dispatch fixture; if a different fixture name is used in the codebase, swap it in. If no equivalent fixture exists, build the minimal one inline: + +```python +@pytest.fixture +async def server_fixture(): + from sidequest.server.app import create_app + # Minimal in-memory app for dispatch testing; adapt to the project's + # existing pattern in tests/server/conftest.py. + ... +``` + +Check `tests/server/conftest.py` for the canonical dispatch fixture and reuse it. + +- [ ] **Step 3: Run test to verify it fails** + +Run: `cd sidequest-server && uv run pytest tests/server/test_turn_span_wiring.py -v` +Expected: FAIL — no `turn` span is opened by dispatch yet. + +- [ ] **Step 4: Wrap the dispatch entry in `turn_span()`** + +In `sidequest-server/sidequest/server/session_handler.py`, identify the dispatch coroutine (located in Step 1). Wrap its body in `turn_span()`. Concretely: + +```python +# At the top of session_handler.py, ensure this import exists: +from sidequest.telemetry.spans import turn_span + +# ... inside the dispatch coroutine (e.g. handle_action) ... +async def handle_action(self, player_id: str, text: str, ...) -> ...: + turn_id = self._next_turn_id() # whatever the existing counter is + agent_name = self._classify_agent(text) # whatever the existing routing is + with turn_span( + turn_id=turn_id, + player_id=player_id, + agent_name=agent_name, + ): + # All existing dispatch logic moves INSIDE this with-block, + # unchanged. + ... +``` + +The `turn_id` source: use the existing turn counter that the protocol delta already references. The `agent_name`: whatever the dispatch already knows about which subsystem is handling the action — fall back to `"unknown"` if routing happens deeper. + +If the dispatch coroutine is large, the entry-and-exit can use the context manager via `async with` if the helper is async — but `turn_span` here is sync, so the `with` form goes around the entire dispatch body. + +- [ ] **Step 5: Run tests to verify they pass** + +Run: `cd sidequest-server && uv run pytest tests/server/test_turn_span_wiring.py -v` +Expected: PASS. + +Run: `cd sidequest-server && uv run pytest tests/server/ tests/telemetry/ -v -x` +Expected: PASS — no regressions across the dispatch and telemetry test suites. + +Commit: + +```bash +cd sidequest-server +git add sidequest/server/session_handler.py tests/server/test_turn_span_wiring.py +git commit -m "feat(server): open turn_span at dispatch entry to anchor traces" +``` + +--- + +## Phase 3 — Validator Pipeline + +### Task 10: Create `TurnRecord` and `PatchSummary` dataclasses + +**Files:** +- Create: `sidequest-server/sidequest/telemetry/turn_record.py` + +- [ ] **Step 1: Write the failing test** + +Create `sidequest-server/tests/telemetry/test_turn_record.py`: + +```python +"""Tests for TurnRecord dataclass shape and immutability.""" + +from __future__ import annotations + +from dataclasses import FrozenInstanceError +from datetime import UTC, datetime + +import pytest + +from sidequest.telemetry.turn_record import PatchSummary, TurnRecord + + +def _stub_snapshot(): + """Minimal stand-in for GameSnapshot — TurnRecord just holds it.""" + return object() + + +def _stub_delta(): + return object() + + +def test_turn_record_is_frozen() -> None: + record = TurnRecord( + turn_id=1, + timestamp=datetime.now(UTC), + player_id="alice", + player_input="I look.", + classified_intent="look", + agent_name="narrator", + narration="The room is dark.", + patches_applied=[], + snapshot_before_hash="abc", + snapshot_after=_stub_snapshot(), + delta=_stub_delta(), + beats_fired=[], + extraction_tier=1, + token_count_in=10, + token_count_out=20, + agent_duration_ms=300, + is_degraded=False, + ) + with pytest.raises(FrozenInstanceError): + record.turn_id = 2 # type: ignore[misc] + + +def test_patch_summary_is_frozen() -> None: + p = PatchSummary(patch_type="world", fields_changed=["location"]) + with pytest.raises(FrozenInstanceError): + p.patch_type = "combat" # type: ignore[misc] + + +def test_turn_record_carries_all_fields() -> None: + """All fields per spec §5.1 are present and accessible.""" + record = TurnRecord( + turn_id=42, + timestamp=datetime.now(UTC), + player_id="alice", + player_input="I attack the troll.", + classified_intent="combat.attack", + agent_name="combat", + narration="You swing.", + patches_applied=[ + PatchSummary(patch_type="combat", fields_changed=["hp"]), + ], + snapshot_before_hash="hash1", + snapshot_after=_stub_snapshot(), + delta=_stub_delta(), + beats_fired=[("desperation", 0.7)], + extraction_tier=2, + token_count_in=120, + token_count_out=240, + agent_duration_ms=812, + is_degraded=False, + ) + assert record.turn_id == 42 + assert record.beats_fired == [("desperation", 0.7)] + assert record.patches_applied[0].patch_type == "combat" +``` + +- [ ] **Step 2: Run test to verify it fails** + +Run: `cd sidequest-server && uv run pytest tests/telemetry/test_turn_record.py -v` +Expected: FAIL — `turn_record` module doesn't exist. + +- [ ] **Step 3: Implement the dataclasses** + +Create `sidequest-server/sidequest/telemetry/turn_record.py`: + +```python +"""TurnRecord — immutable snapshot of a completed dispatch turn. + +Assembled at the end of session_handler.handle_action and put on the +validator queue. Frozen for immutability across the queue boundary +(asyncio doesn't enforce isolation; the dataclass does). + +Per ADR-089 §2.1 (deliberate departure from Rust ADR-031), Python stores +snapshot_before_hash + snapshot_after + delta rather than two full +GameSnapshot clones — same validation power without the double-clone +cost on every turn. +""" + +from __future__ import annotations + +from dataclasses import dataclass, field +from datetime import datetime +from typing import Any + + +@dataclass(frozen=True) +class PatchSummary: + """Compact record of one patch applied during a turn. + + The full patch lives in the snapshot_after via its delta; this + summary is what the validator's patch_legality_check inspects. + """ + + patch_type: str # "world" | "combat" | "chase" | "scenario" + fields_changed: list[str] + + +@dataclass(frozen=True) +class TurnRecord: + """One completed turn, ready for narrative validation.""" + + turn_id: int + timestamp: datetime + player_id: str + player_input: str + classified_intent: str + agent_name: str + narration: str + patches_applied: list[PatchSummary] + snapshot_before_hash: str + snapshot_after: Any # GameSnapshot — typed Any to avoid game-layer dep + delta: Any # StateDelta — same reason + beats_fired: list[tuple[str, float]] # (trope_name, threshold) + extraction_tier: int # 1, 2, or 3 + token_count_in: int + token_count_out: int + agent_duration_ms: int + is_degraded: bool +``` + +The `snapshot_after` and `delta` fields are typed `Any` rather than the concrete `GameSnapshot` / `StateDelta` types to keep `sidequest.telemetry` free of `sidequest.game` imports — same reasoning as `watcher_hub.py` keeping itself FastAPI-free. + +- [ ] **Step 4: Run tests to verify they pass** + +Run: `cd sidequest-server && uv run pytest tests/telemetry/test_turn_record.py -v` +Expected: PASS. + +- [ ] **Step 5: Commit** + +```bash +cd sidequest-server +git add sidequest/telemetry/turn_record.py tests/telemetry/test_turn_record.py +git commit -m "feat(telemetry): TurnRecord and PatchSummary dataclasses" +``` + +--- + +### Task 11: Validator skeleton with bounded asyncio.Queue + +**Files:** +- Create: `sidequest-server/sidequest/telemetry/validator.py` + +- [ ] **Step 1: Write the failing test** + +Create `sidequest-server/tests/telemetry/test_validator_pipeline.py`: + +```python +"""Tests for the Layer-3 narrative validator pipeline.""" + +from __future__ import annotations + +import asyncio +from datetime import UTC, datetime +from unittest.mock import patch + +import pytest + +from sidequest.telemetry.turn_record import PatchSummary, TurnRecord +from sidequest.telemetry.validator import Validator + + +def _make_record(turn_id: int = 1) -> TurnRecord: + return TurnRecord( + turn_id=turn_id, + timestamp=datetime.now(UTC), + player_id="alice", + player_input="I look.", + classified_intent="look", + agent_name="narrator", + narration="The room is dark.", + patches_applied=[], + snapshot_before_hash="h0", + snapshot_after=object(), + delta=object(), + beats_fired=[], + extraction_tier=1, + token_count_in=10, + token_count_out=20, + agent_duration_ms=100, + is_degraded=False, + ) + + +@pytest.mark.asyncio +async def test_validator_starts_and_drains_on_shutdown() -> None: + v = Validator() + await v.start() + assert v.is_running() + + await v.submit(_make_record(turn_id=1)) + await v.shutdown(grace_seconds=2.0) + + assert not v.is_running() + + +@pytest.mark.asyncio +async def test_submit_drops_oldest_under_backpressure() -> None: + v = Validator(queue_maxsize=2) + # Don't start the consumer — let the queue fill. + for i in range(5): + await v.submit(_make_record(turn_id=i)) + + assert v.dropped_records >= 3 +``` + +- [ ] **Step 2: Run test to verify it fails** + +Run: `cd sidequest-server && uv run pytest tests/telemetry/test_validator_pipeline.py::test_validator_starts_and_drains_on_shutdown -v` +Expected: FAIL — `validator` module doesn't exist. + +- [ ] **Step 3: Implement the Validator skeleton** + +Create `sidequest-server/sidequest/telemetry/validator.py`: + +```python +"""Layer-3 narrative validator — consumes TurnRecord, emits typed events. + +Lifecycle: started by FastAPI's startup event, drained on shutdown. +A single asyncio.Task processes one TurnRecord at a time; the queue is +bounded and oldest-record-drops on QueueFull (faithful to ADR-031's +"lossy by design" intent). + +The validator never raises into the dispatch hot path. Each check is +wrapped in try/except — a check exception fires a validation_warning +with severity=error rather than crashing the task. If the task itself +dies, app.py's startup hook restarts it on next request (best-effort). +""" + +from __future__ import annotations + +import asyncio +import logging +import time +from collections import deque +from typing import Awaitable, Callable + +from sidequest.telemetry.turn_record import TurnRecord +from sidequest.telemetry.watcher_hub import publish_event + +logger = logging.getLogger(__name__) + +CheckFn = Callable[[TurnRecord], Awaitable[None]] + + +class Validator: + """Single-consumer narrative validator pipeline.""" + + def __init__(self, queue_maxsize: int = 32) -> None: + self._queue: asyncio.Queue[TurnRecord] = asyncio.Queue( + maxsize=queue_maxsize + ) + self._task: asyncio.Task[None] | None = None + self._stopping = asyncio.Event() + self._checks: list[CheckFn] = [] + # Health counters + self.dropped_records: int = 0 + self._check_durations_ms: deque[tuple[str, float]] = deque(maxlen=200) + + def register_check(self, fn: CheckFn) -> None: + """Register a check coroutine. Called once per TurnRecord.""" + self._checks.append(fn) + + def is_running(self) -> bool: + return self._task is not None and not self._task.done() + + async def submit(self, record: TurnRecord) -> None: + """Enqueue a record. On QueueFull, drop the oldest record.""" + try: + self._queue.put_nowait(record) + except asyncio.QueueFull: + try: + self._queue.get_nowait() + self._queue.task_done() + self.dropped_records += 1 + publish_event( + "validation_warning", + { + "check": "validator.queue", + "reason": "queue_full", + "dropped_total": self.dropped_records, + }, + component="validator", + severity="warning", + ) + except asyncio.QueueEmpty: + pass + try: + self._queue.put_nowait(record) + except asyncio.QueueFull: + self.dropped_records += 1 + + async def start(self) -> None: + if self.is_running(): + return + self._stopping.clear() + self._task = asyncio.create_task( + self._run(), name="sidequest.validator" + ) + logger.info("validator.started") + + async def shutdown(self, grace_seconds: float = 2.0) -> None: + self._stopping.set() + if self._task is None: + return + # Drain remaining records up to the grace window. + try: + await asyncio.wait_for( + self._queue.join(), timeout=grace_seconds + ) + except asyncio.TimeoutError: + logger.warning( + "validator.shutdown_grace_exceeded queued=%d", + self._queue.qsize(), + ) + self._task.cancel() + try: + await self._task + except asyncio.CancelledError: + pass + self._task = None + logger.info("validator.stopped") + + async def _run(self) -> None: + while not self._stopping.is_set(): + try: + record = await asyncio.wait_for( + self._queue.get(), timeout=0.5 + ) + except asyncio.TimeoutError: + continue + try: + await self._validate(record) + finally: + self._queue.task_done() + + async def _validate(self, record: TurnRecord) -> None: + for check in self._checks: + t0 = time.perf_counter() + try: + await check(record) + except Exception as exc: # noqa: BLE001 + logger.exception( + "validator.check_failed check=%s", check.__name__ + ) + publish_event( + "validation_warning", + { + "check": check.__name__, + "error": str(exc), + "turn_id": record.turn_id, + }, + component="validator", + severity="error", + ) + elapsed_ms = (time.perf_counter() - t0) * 1000.0 + self._check_durations_ms.append( + (check.__name__, elapsed_ms) + ) +``` + +- [ ] **Step 4: Run tests to verify they pass** + +Run: `cd sidequest-server && uv run pytest tests/telemetry/test_validator_pipeline.py -v` +Expected: PASS — both lifecycle tests green. + +- [ ] **Step 5: Commit** + +```bash +cd sidequest-server +git add sidequest/telemetry/validator.py tests/telemetry/test_validator_pipeline.py +git commit -m "feat(telemetry): Validator skeleton with bounded queue and lifecycle" +``` + +--- + +### Task 12: Implement `entity_check` + +Each of Tasks 12–17 implements one of the five checks. They share a fixture pattern. + +**Files:** +- Modify: `sidequest-server/sidequest/telemetry/validator.py` +- Modify: `sidequest-server/tests/telemetry/test_validator_pipeline.py` + +- [ ] **Step 1: Write the failing test** + +Add to `sidequest-server/tests/telemetry/test_validator_pipeline.py`: + +```python +from sidequest.telemetry.validator import entity_check +from sidequest.telemetry import watcher_hub as wh_mod + + +class _CapturedEvents(list): + pass + + +@pytest.fixture +def captured_events(monkeypatch): + captured = _CapturedEvents() + + def fake_publish(event_type, fields, *, component="sidequest-server", severity="info"): + captured.append({ + "event_type": event_type, + "fields": fields, + "component": component, + "severity": severity, + }) + + monkeypatch.setattr( + "sidequest.telemetry.validator.publish_event", + fake_publish, + ) + return captured + + +@pytest.mark.asyncio +async def test_entity_check_warns_on_unknown_npc(captured_events) -> None: + """Narration mentioning an NPC not in the registry produces a + validation_warning.""" + snapshot_after = type( + "Snap", + (), + { + "npc_registry": {}, # empty + "discovered_regions": [], + "inventory": type("Inv", (), {"items": []})(), + }, + )() + + record = _make_record() + record_dict = record.__dict__.copy() + record_dict["narration"] = "Sir Reginald nods grimly." + record_dict["snapshot_after"] = snapshot_after + new_record = TurnRecord(**record_dict) + + await entity_check(new_record) + + warnings = [e for e in captured_events if e["event_type"] == "validation_warning"] + assert warnings, "entity_check should warn on unknown NPC" + assert "Sir Reginald" in str(warnings[0]["fields"]) +``` + +- [ ] **Step 2: Run test to verify it fails** + +Run: `cd sidequest-server && uv run pytest tests/telemetry/test_validator_pipeline.py::test_entity_check_warns_on_unknown_npc -v` +Expected: FAIL — `entity_check` does not exist. + +- [ ] **Step 3: Implement `entity_check`** + +Add to `sidequest-server/sidequest/telemetry/validator.py`: + +```python +import re + +# Capitalized two-word noun phrases — heuristic for "named entity in +# narration." Matches "Sir Reginald", "The Ironwood", "Lady Ashes" etc. +# False positives are fine — entity_check is a hint, not an oracle. +_NAMED_ENTITY_RE = re.compile(r"\b([A-Z][a-z]+(?:\s+[A-Z][a-z]+)+)\b") + + +async def entity_check(record: TurnRecord) -> None: + """Warn when narration names an NPC / region / item absent from the + snapshot. + + Reads: + - narration + - snapshot_after.npc_registry (mapping name -> NpcRegistryEntry) + - snapshot_after.discovered_regions (iterable of region names) + - snapshot_after.inventory.items (iterable of item names) + """ + snap = record.snapshot_after + known_names: set[str] = set() + npc_registry = getattr(snap, "npc_registry", None) or {} + if isinstance(npc_registry, dict): + known_names.update(npc_registry.keys()) + regions = getattr(snap, "discovered_regions", None) or () + known_names.update(str(r) for r in regions) + inventory = getattr(snap, "inventory", None) + if inventory is not None: + items = getattr(inventory, "items", None) or () + for it in items: + name = getattr(it, "name", None) or str(it) + known_names.add(name) + + if not record.narration: + return + + for match in _NAMED_ENTITY_RE.finditer(record.narration): + candidate = match.group(1) + if candidate not in known_names: + publish_event( + "validation_warning", + { + "check": "entity", + "turn_id": record.turn_id, + "candidate": candidate, + "rationale": "narration names an entity not in snapshot", + }, + component="validator", + severity="warning", + ) + # One warning per turn is sufficient; don't spam. + return +``` + +Then register the check in `Validator.__init__` after `self._check_durations_ms` is initialized: + +```python + # Default check registration. Test code can register additional + # or replacement checks before start(). + self.register_check(entity_check) +``` + +- [ ] **Step 4: Run tests to verify they pass** + +Run: `cd sidequest-server && uv run pytest tests/telemetry/test_validator_pipeline.py -v` +Expected: PASS — entity_check warns; existing lifecycle tests still pass. + +- [ ] **Step 5: Commit** + +```bash +cd sidequest-server +git add sidequest/telemetry/validator.py tests/telemetry/test_validator_pipeline.py +git commit -m "feat(validator): entity_check warns on narration of unknown entities" +``` + +--- + +### Task 13: Implement `inventory_check` + +**Files:** +- Modify: `sidequest-server/sidequest/telemetry/validator.py` +- Modify: `sidequest-server/tests/telemetry/test_validator_pipeline.py` + +- [ ] **Step 1: Write the failing test** + +Add to the test file: + +```python +from sidequest.telemetry.validator import inventory_check + + +@pytest.mark.asyncio +async def test_inventory_check_warns_on_narration_grab_with_no_patch( + captured_events, +) -> None: + """Narration says 'you grab the lantern' but no patch added 'lantern'.""" + record_dict = _make_record().__dict__.copy() + record_dict["narration"] = "You grab the lantern from the shelf." + record_dict["patches_applied"] = [] # nothing added + record_dict["delta"] = type("Delta", (), {"inventory_changes": []})() + record = TurnRecord(**record_dict) + + await inventory_check(record) + warnings = [e for e in captured_events if e["event_type"] == "validation_warning"] + assert any("inventory" in str(w["fields"]) for w in warnings) + + +@pytest.mark.asyncio +async def test_inventory_check_warns_on_silent_patch(captured_events) -> None: + """Patch added 'rope' but narration is silent on it.""" + record_dict = _make_record().__dict__.copy() + record_dict["narration"] = "You walk forward." + record_dict["patches_applied"] = [ + PatchSummary(patch_type="world", fields_changed=["inventory.rope"]), + ] + record_dict["delta"] = type( + "Delta", + (), + {"inventory_changes": [{"item": "rope", "delta": 1}]}, + )() + record = TurnRecord(**record_dict) + + await inventory_check(record) + warnings = [e for e in captured_events if e["event_type"] == "validation_warning"] + assert any("rope" in str(w["fields"]) for w in warnings) +``` + +- [ ] **Step 2: Run test to verify it fails** + +Run: `cd sidequest-server && uv run pytest tests/telemetry/test_validator_pipeline.py::test_inventory_check_warns_on_narration_grab_with_no_patch -v` +Expected: FAIL. + +- [ ] **Step 3: Implement `inventory_check`** + +Add to `validator.py`: + +```python +_GRAB_VERBS = ( + "grab", "take", "pick up", "pocket", "stash", "loot", + "snatch", "scoop", "lift", "claim", +) + + +async def inventory_check(record: TurnRecord) -> None: + """Cross-check narration against inventory deltas. + + Two failure modes: + 1. Narration uses a grab-verb but no patch added inventory. + 2. A patch added an item but narration doesn't mention it. + """ + narration = (record.narration or "").lower() + delta = record.delta + inv_changes = getattr(delta, "inventory_changes", None) or [] + has_inventory_patch = bool(inv_changes) or any( + any("inventory" in f for f in p.fields_changed) + for p in record.patches_applied + ) + + grabbed_in_narration = any(v in narration for v in _GRAB_VERBS) + + if grabbed_in_narration and not has_inventory_patch: + publish_event( + "validation_warning", + { + "check": "inventory", + "turn_id": record.turn_id, + "rationale": "narration describes a grab but no inventory patch", + }, + component="validator", + severity="warning", + ) + + # Patch added an item, narration silent on it. + for change in inv_changes: + item = change.get("item") if isinstance(change, dict) else getattr(change, "item", None) + if not item: + continue + if str(item).lower() not in narration: + publish_event( + "validation_warning", + { + "check": "inventory", + "turn_id": record.turn_id, + "item": item, + "rationale": "patch added item but narration is silent", + }, + component="validator", + severity="warning", + ) +``` + +Register the check in `Validator.__init__` alongside `entity_check`: + +```python + self.register_check(entity_check) + self.register_check(inventory_check) +``` + +- [ ] **Step 4: Run tests to verify they pass** + +Run: `cd sidequest-server && uv run pytest tests/telemetry/test_validator_pipeline.py -v` +Expected: PASS. + +- [ ] **Step 5: Commit** + +```bash +cd sidequest-server +git add sidequest/telemetry/validator.py tests/telemetry/test_validator_pipeline.py +git commit -m "feat(validator): inventory_check warns on narration/patch mismatches" +``` + +--- + +### Task 14: Implement `patch_legality_check` + +**Files:** +- Modify: `sidequest-server/sidequest/telemetry/validator.py` +- Modify: `sidequest-server/tests/telemetry/test_validator_pipeline.py` + +- [ ] **Step 1: Write the failing test** + +```python +from sidequest.telemetry.validator import patch_legality_check + + +@pytest.mark.asyncio +async def test_patch_legality_warns_on_hp_over_max(captured_events) -> None: + """HP > max in snapshot_after is an illegal patch outcome.""" + + class _Char: + def __init__(self, hp: int, hp_max: int) -> None: + self.hp = hp + self.hp_max = hp_max + + snapshot_after = type( + "Snap", + (), + { + "characters": {"alice": _Char(hp=120, hp_max=100)}, + "npc_registry": {}, + }, + )() + record_dict = _make_record().__dict__.copy() + record_dict["snapshot_after"] = snapshot_after + record_dict["patches_applied"] = [ + PatchSummary(patch_type="combat", fields_changed=["characters.alice.hp"]), + ] + record = TurnRecord(**record_dict) + + await patch_legality_check(record) + errors = [ + e for e in captured_events + if e["event_type"] == "validation_warning" and e["severity"] == "error" + ] + assert errors, "HP-over-max should produce an error-severity warning" +``` + +- [ ] **Step 2: Run test to verify it fails** + +Run: `cd sidequest-server && uv run pytest tests/telemetry/test_validator_pipeline.py::test_patch_legality_warns_on_hp_over_max -v` +Expected: FAIL. + +- [ ] **Step 3: Implement `patch_legality_check`** + +Add to `validator.py`: + +```python +async def patch_legality_check(record: TurnRecord) -> None: + """Detect illegal post-patch state. + + Checks (per ADR-031 §"Patch legality"): + - HP > max for any character or NPC + - Dead NPC (hp <= 0) appears in patches_applied as an actor + - (Cartography graph adjacency check is deferred until the + cartography graph is ported — see ADR-019.) + """ + snap = record.snapshot_after + characters = getattr(snap, "characters", None) or {} + npc_registry = getattr(snap, "npc_registry", None) or {} + + def _check_hp(label: str, owner: str, ch: object) -> None: + hp = getattr(ch, "hp", None) + hp_max = getattr(ch, "hp_max", None) + if isinstance(hp, int) and isinstance(hp_max, int) and hp > hp_max: + publish_event( + "validation_warning", + { + "check": "patch_legality", + "turn_id": record.turn_id, + "subject": owner, + "subject_kind": label, + "hp": hp, + "hp_max": hp_max, + "rationale": "HP exceeds maximum", + }, + component="validator", + severity="error", + ) + + for owner, ch in characters.items(): + _check_hp("character", str(owner), ch) + if isinstance(npc_registry, dict): + for owner, npc in npc_registry.items(): + _check_hp("npc", str(owner), npc) + + # Dead-actor check: if a patch_type=combat patch references an NPC + # that the snapshot now reports as dead (hp <= 0), surface a warning. + dead_npcs = { + name + for name, npc in (npc_registry.items() if isinstance(npc_registry, dict) else ()) + if isinstance(getattr(npc, "hp", None), int) + and getattr(npc, "hp", 0) <= 0 + } + for patch in record.patches_applied: + if patch.patch_type != "combat": + continue + for field in patch.fields_changed: + for dead in dead_npcs: + if dead in field and "hp" not in field: + publish_event( + "validation_warning", + { + "check": "patch_legality", + "turn_id": record.turn_id, + "actor": dead, + "rationale": "dead NPC referenced as actor in combat patch", + }, + component="validator", + severity="error", + ) + return +``` + +Register in `Validator.__init__`: + +```python + self.register_check(patch_legality_check) +``` + +- [ ] **Step 4: Run tests to verify they pass** + +Run: `cd sidequest-server && uv run pytest tests/telemetry/test_validator_pipeline.py -v` +Expected: PASS. + +- [ ] **Step 5: Commit** + +```bash +cd sidequest-server +git add sidequest/telemetry/validator.py tests/telemetry/test_validator_pipeline.py +git commit -m "feat(validator): patch_legality_check flags HP overflow and dead-actor patches" +``` + +--- + +### Task 15: Implement `trope_alignment_check` + +**Files:** +- Modify: `sidequest-server/sidequest/telemetry/validator.py` +- Modify: `sidequest-server/tests/telemetry/test_validator_pipeline.py` + +- [ ] **Step 1: Write the failing test** + +```python +from sidequest.telemetry.validator import ( + trope_alignment_check, + TROPE_KEYWORDS_SOURCE, +) + + +@pytest.mark.asyncio +async def test_trope_alignment_warns_when_keywords_absent( + captured_events, monkeypatch, +) -> None: + """Beat 'desperation' fired but narration lacks any of its keywords.""" + monkeypatch.setitem( + TROPE_KEYWORDS_SOURCE, + "desperation", + ["frantic", "shaking", "ragged", "trembling"], + ) + + record_dict = _make_record().__dict__.copy() + record_dict["beats_fired"] = [("desperation", 0.7)] + record_dict["narration"] = "You walk down the hallway calmly." + record = TurnRecord(**record_dict) + + await trope_alignment_check(record) + warnings = [e for e in captured_events if e["event_type"] == "validation_warning"] + assert any( + "trope_alignment" in str(w["fields"]) for w in warnings + ) + + +@pytest.mark.asyncio +async def test_trope_alignment_silent_when_keywords_present( + captured_events, monkeypatch, +) -> None: + monkeypatch.setitem( + TROPE_KEYWORDS_SOURCE, + "desperation", + ["frantic", "shaking", "ragged"], + ) + + record_dict = _make_record().__dict__.copy() + record_dict["beats_fired"] = [("desperation", 0.7)] + record_dict["narration"] = "Your hands are shaking as you reach for the door." + record = TurnRecord(**record_dict) + + await trope_alignment_check(record) + warnings = [e for e in captured_events if e["event_type"] == "validation_warning"] + assert not any("trope_alignment" in str(w["fields"]) for w in warnings) +``` + +- [ ] **Step 2: Run test to verify it fails** + +Run: `cd sidequest-server && uv run pytest tests/telemetry/test_validator_pipeline.py::test_trope_alignment_warns_when_keywords_absent -v` +Expected: FAIL — `trope_alignment_check` not defined. + +- [ ] **Step 3: Implement `trope_alignment_check`** + +Add to `validator.py`: + +```python +# Per-trope keyword sources — populated lazily from the genre packs the +# first time the check fires. Tests can monkeypatch this dict directly. +# Each value is the list of keywords that "should" appear in narration +# when that trope's beat fires. ADR-031 §"Trope alignment" specifies that +# this is read off each trope's `keywords` list — no second LLM call. +TROPE_KEYWORDS_SOURCE: dict[str, list[str]] = {} + + +def _trope_keywords(trope: str) -> list[str]: + if trope in TROPE_KEYWORDS_SOURCE: + return TROPE_KEYWORDS_SOURCE[trope] + # Lazy load — the trope catalog is in sidequest.game.trope; importing + # here at module level would create a cycle. Import on first use, + # cache the result, and tolerate the absence if the genre layer + # isn't loaded (e.g. unit tests). + try: + from sidequest.game import trope as trope_mod # noqa: PLC0415 + + keywords = getattr(trope_mod, "keywords_for", lambda _t: [])(trope) + TROPE_KEYWORDS_SOURCE[trope] = list(keywords) + return TROPE_KEYWORDS_SOURCE[trope] + except Exception: # noqa: BLE001 + return [] + + +async def trope_alignment_check(record: TurnRecord) -> None: + """For each beat that fired this turn, warn if none of the trope's + keywords appear in narration.""" + if not record.beats_fired: + return + narration_lower = (record.narration or "").lower() + for trope, _threshold in record.beats_fired: + keywords = _trope_keywords(trope) + if not keywords: + continue + if not any(kw.lower() in narration_lower for kw in keywords): + publish_event( + "validation_warning", + { + "check": "trope_alignment", + "turn_id": record.turn_id, + "trope": trope, + "expected_any_of": keywords, + "rationale": "trope beat fired but no keywords in narration", + }, + component="validator", + severity="warning", + ) +``` + +Register in `Validator.__init__`: + +```python + self.register_check(trope_alignment_check) +``` + +- [ ] **Step 4: Run tests to verify they pass** + +Run: `cd sidequest-server && uv run pytest tests/telemetry/test_validator_pipeline.py -v` +Expected: PASS. + +- [ ] **Step 5: Commit** + +```bash +cd sidequest-server +git add sidequest/telemetry/validator.py tests/telemetry/test_validator_pipeline.py +git commit -m "feat(validator): trope_alignment_check warns on beats fired without keywords" +``` + +--- + +### Task 16: Implement `subsystem_exercise_check` and `coverage_gap` + +**Files:** +- Modify: `sidequest-server/sidequest/telemetry/validator.py` +- Modify: `sidequest-server/tests/telemetry/test_validator_pipeline.py` + +- [ ] **Step 1: Write the failing test** + +```python +from sidequest.telemetry.validator import subsystem_exercise_check + + +@pytest.mark.asyncio +async def test_subsystem_exercise_emits_per_turn_summary(captured_events) -> None: + """Every turn produces a subsystem_exercise_summary event.""" + record = _make_record() + await subsystem_exercise_check(record) + summaries = [ + e for e in captured_events + if e["event_type"] == "subsystem_exercise_summary" + ] + assert summaries, "subsystem_exercise_check should emit a per-turn summary" + + +@pytest.mark.asyncio +async def test_subsystem_exercise_emits_coverage_gap_after_silence( + captured_events, +) -> None: + """When a subsystem hasn't fired in N turns, emit coverage_gap.""" + # Reset the sliding window; simulate 11 narrator-only turns. + from sidequest.telemetry.validator import _reset_subsystem_window + + _reset_subsystem_window() + for i in range(11): + record_dict = _make_record(turn_id=i).__dict__.copy() + record_dict["agent_name"] = "narrator" + await subsystem_exercise_check(TurnRecord(**record_dict)) + + gaps = [e for e in captured_events if e["event_type"] == "coverage_gap"] + assert gaps, "Expected a coverage_gap after a long subsystem silence" +``` + +- [ ] **Step 2: Run test to verify it fails** + +Run: `cd sidequest-server && uv run pytest tests/telemetry/test_validator_pipeline.py::test_subsystem_exercise_emits_per_turn_summary -v` +Expected: FAIL. + +- [ ] **Step 3: Implement the check** + +Add to `validator.py`: + +```python +# Sliding window of (turn_id, agent_name) tuples. +_SUBSYSTEM_WINDOW: deque[tuple[int, str]] = deque(maxlen=50) +_KNOWN_SUBSYSTEMS = { + "narrator", "combat", "merchant", "world_builder", + "scenario", "encounter", "chargen", "trope", "barrier", +} +_COVERAGE_GAP_THRESHOLD_TURNS = 10 + + +def _reset_subsystem_window() -> None: + """Test helper — clears the sliding window.""" + _SUBSYSTEM_WINDOW.clear() + + +async def subsystem_exercise_check(record: TurnRecord) -> None: + """Per-turn rollup of which subsystem ran, plus periodic coverage_gap + when a subsystem hasn't been exercised in N turns.""" + _SUBSYSTEM_WINDOW.append((record.turn_id, record.agent_name)) + + publish_event( + "subsystem_exercise_summary", + { + "turn_id": record.turn_id, + "agent_name": record.agent_name, + "window_depth": len(_SUBSYSTEM_WINDOW), + }, + component="validator", + severity="info", + ) + + if len(_SUBSYSTEM_WINDOW) < _COVERAGE_GAP_THRESHOLD_TURNS: + return + + recent_agents = { + agent for _t, agent in list(_SUBSYSTEM_WINDOW)[-_COVERAGE_GAP_THRESHOLD_TURNS:] + } + silent = _KNOWN_SUBSYSTEMS - recent_agents + for sub in silent: + publish_event( + "coverage_gap", + { + "turn_id": record.turn_id, + "subsystem": sub, + "silent_turns": _COVERAGE_GAP_THRESHOLD_TURNS, + "rationale": "no agent invocation in sliding window", + }, + component="validator", + severity="info", + ) +``` + +Register in `Validator.__init__`: + +```python + self.register_check(subsystem_exercise_check) +``` + +- [ ] **Step 4: Run tests to verify they pass** + +Run: `cd sidequest-server && uv run pytest tests/telemetry/test_validator_pipeline.py -v` +Expected: PASS. + +- [ ] **Step 5: Commit** + +```bash +cd sidequest-server +git add sidequest/telemetry/validator.py tests/telemetry/test_validator_pipeline.py +git commit -m "feat(validator): subsystem exercise summary + coverage_gap from sliding window" +``` + +--- + +### Task 17: Validator emits `turn_complete` from each TurnRecord + +**Files:** +- Modify: `sidequest-server/sidequest/telemetry/validator.py` +- Modify: `sidequest-server/tests/telemetry/test_validator_pipeline.py` + +The translator owns most typed events, but `turn_complete` is owned by the validator because it has the full `TurnRecord` (per spec §6.7). + +- [ ] **Step 1: Write the failing test** + +```python +@pytest.mark.asyncio +async def test_validator_emits_turn_complete_first(captured_events) -> None: + """turn_complete is emitted before the five checks run, and carries + fields populated from the TurnRecord.""" + v = Validator() + await v.start() + try: + record = _make_record(turn_id=99) + await v.submit(record) + # Allow the consumer to process. + await asyncio.sleep(0.1) + finally: + await v.shutdown() + + completes = [e for e in captured_events if e["event_type"] == "turn_complete"] + assert completes, "validator must emit turn_complete per TurnRecord" + assert completes[0]["fields"]["turn_id"] == 99 + assert completes[0]["fields"]["agent_name"] == "narrator" +``` + +- [ ] **Step 2: Run test to verify it fails** + +Run: `cd sidequest-server && uv run pytest tests/telemetry/test_validator_pipeline.py::test_validator_emits_turn_complete_first -v` +Expected: FAIL. + +- [ ] **Step 3: Emit `turn_complete` at the start of `_validate`** + +In `validator.py`, modify `_validate` to publish `turn_complete` before iterating checks: + +```python + async def _validate(self, record: TurnRecord) -> None: + publish_event( + "turn_complete", + { + "turn_id": record.turn_id, + "player_id": record.player_id, + "agent_name": record.agent_name, + "extraction_tier": record.extraction_tier, + "token_count_in": record.token_count_in, + "token_count_out": record.token_count_out, + "agent_duration_ms": record.agent_duration_ms, + "is_degraded": record.is_degraded, + "patches_applied": [p.patch_type for p in record.patches_applied], + "beats_fired": [t for t, _ in record.beats_fired], + }, + component="validator", + severity="info", + ) + for check in self._checks: + t0 = time.perf_counter() + try: + await check(record) + except Exception as exc: # noqa: BLE001 + logger.exception( + "validator.check_failed check=%s", check.__name__ + ) + publish_event( + "validation_warning", + { + "check": check.__name__, + "error": str(exc), + "turn_id": record.turn_id, + }, + component="validator", + severity="error", + ) + elapsed_ms = (time.perf_counter() - t0) * 1000.0 + self._check_durations_ms.append( + (check.__name__, elapsed_ms) + ) +``` + +- [ ] **Step 4: Run tests to verify they pass** + +Run: `cd sidequest-server && uv run pytest tests/telemetry/test_validator_pipeline.py -v` +Expected: PASS. + +- [ ] **Step 5: Commit** + +```bash +cd sidequest-server +git add sidequest/telemetry/validator.py tests/telemetry/test_validator_pipeline.py +git commit -m "feat(validator): emit turn_complete from each TurnRecord" +``` + +--- + +### Task 18: Validator health emissions + +**Files:** +- Modify: `sidequest-server/sidequest/telemetry/validator.py` +- Modify: `sidequest-server/tests/telemetry/test_validator_pipeline.py` + +- [ ] **Step 1: Write the failing test** + +```python +@pytest.mark.asyncio +async def test_validator_emits_periodic_queue_depth(captured_events) -> None: + """Validator surfaces queue_depth as state_transition events.""" + v = Validator() + v._heartbeat_interval = 0.1 # speed up for the test + await v.start() + try: + await v.submit(_make_record()) + await asyncio.sleep(0.3) # let heartbeat fire + finally: + await v.shutdown() + + health = [ + e for e in captured_events + if e["event_type"] == "state_transition" + and e["component"] == "validator" + and "queue_depth" in str(e["fields"]) + ] + assert health, "expected validator queue_depth heartbeat" +``` + +- [ ] **Step 2: Run test to verify it fails** + +Run: `cd sidequest-server && uv run pytest tests/telemetry/test_validator_pipeline.py::test_validator_emits_periodic_queue_depth -v` +Expected: FAIL. + +- [ ] **Step 3: Add the heartbeat task** + +In `validator.py`, add to `Validator.__init__`: + +```python + self._heartbeat_interval: float = 30.0 + self._heartbeat_task: asyncio.Task[None] | None = None +``` + +Modify `start` and `shutdown`: + +```python + async def start(self) -> None: + if self.is_running(): + return + self._stopping.clear() + self._task = asyncio.create_task( + self._run(), name="sidequest.validator" + ) + self._heartbeat_task = asyncio.create_task( + self._heartbeat(), name="sidequest.validator.heartbeat" + ) + logger.info("validator.started") + + async def shutdown(self, grace_seconds: float = 2.0) -> None: + self._stopping.set() + if self._heartbeat_task is not None: + self._heartbeat_task.cancel() + try: + await self._heartbeat_task + except asyncio.CancelledError: + pass + self._heartbeat_task = None + # ... existing _task shutdown logic ... + if self._task is None: + return + try: + await asyncio.wait_for( + self._queue.join(), timeout=grace_seconds + ) + except asyncio.TimeoutError: + logger.warning( + "validator.shutdown_grace_exceeded queued=%d", + self._queue.qsize(), + ) + self._task.cancel() + try: + await self._task + except asyncio.CancelledError: + pass + self._task = None + logger.info("validator.stopped") +``` + +Add the `_heartbeat` method: + +```python + async def _heartbeat(self) -> None: + while not self._stopping.is_set(): + try: + await asyncio.sleep(self._heartbeat_interval) + except asyncio.CancelledError: + return + durations = list(self._check_durations_ms) + p50 = _percentile([d for _, d in durations], 50) + p99 = _percentile([d for _, d in durations], 99) + publish_event( + "state_transition", + { + "field": "validator.heartbeat", + "queue_depth": self._queue.qsize(), + "queue_max": self._queue.maxsize, + "dropped_records": self.dropped_records, + "check_p50_ms": p50, + "check_p99_ms": p99, + }, + component="validator", + severity="info", + ) + + +def _percentile(values: list[float], pct: int) -> float: + if not values: + return 0.0 + s = sorted(values) + idx = max(0, min(len(s) - 1, int(len(s) * pct / 100))) + return round(s[idx], 2) +``` + +- [ ] **Step 4: Run tests to verify they pass** + +Run: `cd sidequest-server && uv run pytest tests/telemetry/test_validator_pipeline.py -v` +Expected: PASS. + +- [ ] **Step 5: Commit** + +```bash +cd sidequest-server +git add sidequest/telemetry/validator.py tests/telemetry/test_validator_pipeline.py +git commit -m "feat(validator): periodic heartbeat surfaces queue depth and check timing" +``` + +--- + +### Task 19: Validator backpressure & crash containment tests + +**Files:** +- Modify: `sidequest-server/sidequest/telemetry/test_validator_pipeline.py` + +- [ ] **Step 1: Add backpressure and crash-containment tests** + +```python +@pytest.mark.asyncio +async def test_validator_survives_crashing_check(captured_events) -> None: + """A check that raises must not kill the task; other checks still run.""" + v = Validator() + + async def boom(_record: TurnRecord) -> None: + raise RuntimeError("intentional") + + async def benign(_record: TurnRecord) -> None: + publish_event( + "validation_warning", + {"check": "benign", "noted": True}, + component="validator", + ) + + # Replace registered checks with our two-test set. + v._checks = [boom, benign] + await v.start() + try: + await v.submit(_make_record()) + await asyncio.sleep(0.2) + finally: + await v.shutdown() + + crash_events = [ + e for e in captured_events + if e["event_type"] == "validation_warning" and "intentional" in str(e["fields"]) + ] + benign_events = [ + e for e in captured_events + if e["event_type"] == "validation_warning" and e["fields"].get("check") == "benign" + ] + assert crash_events, "crash should be reported as validation_warning" + assert benign_events, "benign check must still run after the crashing one" + + +@pytest.mark.asyncio +async def test_backpressure_drops_oldest_and_emits_warning(captured_events) -> None: + v = Validator(queue_maxsize=2) + # No consumer running: every submit beyond capacity drops oldest. + for i in range(5): + await v.submit(_make_record(turn_id=i)) + + drops = [ + e for e in captured_events + if e["event_type"] == "validation_warning" + and e["fields"].get("reason") == "queue_full" + ] + assert drops, "queue-full should publish a warning" + assert v.dropped_records >= 3 +``` + +- [ ] **Step 2: Run tests to verify they pass** + +Run: `cd sidequest-server && uv run pytest tests/telemetry/test_validator_pipeline.py -v` +Expected: PASS — Tasks 11–18 already produced the behavior these tests assert. + +- [ ] **Step 3: No-op (tests only)** + +If a test reveals a gap (e.g. backpressure warning doesn't fire), patch `Validator.submit` to ensure the warning emits. The intended behavior is already in Task 11's implementation; if a test fails here, fix the regression in `validator.py`. + +- [ ] **Step 4: Run the full validator suite** + +Run: `cd sidequest-server && uv run pytest tests/telemetry/ -v` +Expected: PASS. + +- [ ] **Step 5: Commit** + +```bash +cd sidequest-server +git add tests/telemetry/test_validator_pipeline.py +git commit -m "test(validator): crash-containment and backpressure coverage" +``` + +--- + +### Task 20: Wire validator lifecycle into `app.py` + +**Files:** +- Modify: `sidequest-server/sidequest/server/app.py` + +- [ ] **Step 1: Write the failing test** + +Add to `sidequest-server/tests/server/test_app.py` (extend the existing file): + +```python +@pytest.mark.asyncio +async def test_validator_starts_with_app() -> None: + """create_app() registers a startup hook that boots the validator.""" + from fastapi.testclient import TestClient + + from sidequest.server.app import create_app + + app = create_app() + with TestClient(app): + validator = getattr(app.state, "validator", None) + assert validator is not None, ( + "app.state.validator should be populated at startup" + ) + assert validator.is_running() + # On exit, the TestClient's shutdown lifespan triggers shutdown. + assert not validator.is_running() +``` + +- [ ] **Step 2: Run test to verify it fails** + +Run: `cd sidequest-server && uv run pytest tests/server/test_app.py::test_validator_starts_with_app -v` +Expected: FAIL — `app.state.validator` does not exist. + +- [ ] **Step 3: Wire the validator in `create_app`** + +In `sidequest-server/sidequest/server/app.py`, add to the imports: + +```python +from sidequest.telemetry.validator import Validator +``` + +Inside `create_app`, after the `app.state.watcher_hub = watcher_hub` line, add: + +```python + app.state.validator = Validator() +``` + +Add a startup handler (alongside `_wire_watcher`): + +```python + @app.on_event("startup") + async def _start_validator() -> None: + await app.state.validator.start() + logger.info("validator.startup_wired") + + @app.on_event("shutdown") + async def _stop_validator() -> None: + v = getattr(app.state, "validator", None) + if v is not None: + await v.shutdown(grace_seconds=2.0) + logger.info("validator.shutdown_wired") +``` + +- [ ] **Step 4: Run tests to verify they pass** + +Run: `cd sidequest-server && uv run pytest tests/server/test_app.py -v` +Expected: PASS. + +Run: `cd sidequest-server && uv run pytest tests/server/ tests/telemetry/ -v -x` +Expected: PASS — full server + telemetry suite green. + +- [ ] **Step 5: Commit** + +```bash +cd sidequest-server +git add sidequest/server/app.py tests/server/test_app.py +git commit -m "feat(server): wire Validator lifecycle into FastAPI startup/shutdown" +``` + +--- + +### Task 21: Assemble TurnRecord at dispatch and submit to validator + +**Files:** +- Modify: `sidequest-server/sidequest/server/session_handler.py` + +- [ ] **Step 1: Identify TurnRecord-source data in dispatch** + +Run: `cd sidequest-server && grep -n "turn_id\|narration\|patches\|beats_fired\|extraction_tier\|token_count\|duration_ms\|is_degraded" sidequest/server/session_handler.py | head -40` +Expected: locations in dispatch where each field's source value already exists. Some fields (e.g. `extraction_tier`, `token_count_in`, `is_degraded`) may already be tracked by the agent client; others may need to be threaded back from `orchestrator.process_action`. + +If a field has no obvious source: default to a sentinel (`extraction_tier=1`, `token_count_in=0`, `is_degraded=False`) — this plan does not block on perfect field population. Follow-up plans for the agent emission family will tighten these. + +- [ ] **Step 2: Write the wiring test** + +Add to `sidequest-server/tests/server/test_turn_span_wiring.py`: + +```python +@pytest.mark.asyncio +async def test_dispatch_submits_turn_record_to_validator(server_fixture) -> None: + """At dispatch end, a TurnRecord lands on app.state.validator's queue.""" + submitted: list = [] + + async def fake_submit(record): + submitted.append(record) + + # Patch the validator's submit so we observe what the dispatch sends. + with patch.object( + server_fixture.app.state.validator, "submit", new=fake_submit + ): + await server_fixture.dispatch_action( + player="alice", + text="I look around.", + ) + + assert submitted, "dispatch must submit a TurnRecord at end of turn" + record = submitted[0] + assert record.player_id == "alice" + assert record.player_input == "I look around." +``` + +- [ ] **Step 3: Run test to verify it fails** + +Run: `cd sidequest-server && uv run pytest tests/server/test_turn_span_wiring.py::test_dispatch_submits_turn_record_to_validator -v` +Expected: FAIL — no TurnRecord assembly in dispatch yet. + +- [ ] **Step 4: Add TurnRecord assembly inside `turn_span()`** + +In `session_handler.py`, modify the dispatch coroutine wrapped by `turn_span()` (Task 9). Inside the `with turn_span(...)` block, at the end (after narration/patches are computed), assemble and submit: + +```python +from datetime import UTC, datetime +from hashlib import blake2b + +from sidequest.telemetry.turn_record import PatchSummary, TurnRecord + + +def _hash_snapshot(snap: object) -> str: + """Cheap stable hash of a snapshot for replay-keying / change-detection. + + blake2b over the snapshot's repr — repr is not formally stable, but + snapshots are dataclasses/pydantic models with deterministic field + order so it's stable in practice. If a future test asserts identity + across processes, swap to a JSON-serialized canonical form. + """ + return blake2b(repr(snap).encode(), digest_size=16).hexdigest() + + +# ... inside the dispatch coroutine, after patches and narration are computed: +record = TurnRecord( + turn_id=turn_id, + timestamp=datetime.now(UTC), + player_id=player_id, + player_input=text, + classified_intent=classified_intent, + agent_name=agent_name, + narration=narration_text, + patches_applied=[ + PatchSummary( + patch_type=p.kind if hasattr(p, "kind") else "world", + fields_changed=list(p.fields_changed) if hasattr(p, "fields_changed") else [], + ) + for p in patches_applied + ], + snapshot_before_hash=_hash_snapshot(snapshot_before), + snapshot_after=snapshot_after, + delta=delta, + beats_fired=list(beats_fired) if beats_fired else [], + extraction_tier=getattr(agent_result, "extraction_tier", 1), + token_count_in=getattr(agent_result, "token_count_in", 0), + token_count_out=getattr(agent_result, "token_count_out", 0), + agent_duration_ms=int(getattr(agent_result, "duration_ms", 0)), + is_degraded=getattr(agent_result, "is_degraded", False), +) + +validator = self._room_registry.app_state.validator if hasattr(self, "_room_registry") else None +# Plumb app.state.validator to the session handler — see Step 5. +if validator is None: + validator = getattr(self, "_validator", None) +if validator is not None: + await validator.submit(record) +``` + +The variable names (`patches_applied`, `narration_text`, `agent_result`, etc.) come from whatever the existing dispatch already binds; align with the actual identifiers in `session_handler.py`. + +- [ ] **Step 5: Plumb the validator reference** + +The session handler needs access to `app.state.validator`. Two options: + +1. **Constructor injection.** If `WebSocketSessionHandler.__init__` already takes app state, add `validator: Validator` to it; thread through from `app.py` where the handler is constructed. +2. **Late import.** If construction is hidden behind a factory, import lazily: `from sidequest.server.app import _resolve_validator` (a module-level helper that reaches into the active app instance). + +Prefer option 1 (constructor injection) — explicit, testable, and follows the existing DI pattern (`claude_client_factory`, `genre_pack_search_paths`, etc.). Modify `app.py` where `WebSocketSessionHandler` is constructed to pass `validator=app.state.validator`. + +Concretely in `app.py`, locate the `WebSocketSessionHandler(...)` call (likely in the `/ws` endpoint) and add the `validator=app.state.validator` kwarg. Then add the parameter to `WebSocketSessionHandler.__init__` and store it as `self._validator`. + +- [ ] **Step 6: Run tests to verify they pass** + +Run: `cd sidequest-server && uv run pytest tests/server/test_turn_span_wiring.py -v` +Expected: PASS. + +Run: `cd sidequest-server && uv run pytest tests/server/ tests/telemetry/ -v -x` +Expected: PASS — full suite green. + +- [ ] **Step 7: Commit** + +```bash +cd sidequest-server +git add sidequest/server/session_handler.py sidequest/server/app.py tests/server/test_turn_span_wiring.py +git commit -m "feat(server): assemble TurnRecord at dispatch end and submit to validator" +``` + +--- + +## Phase 4 — Sweep & Cleanup + +### Task 22: Update the `watcher.ts` source comment + +**Files:** +- Modify: `sidequest-ui/src/types/watcher.ts` + +- [ ] **Step 1: Inspect the current comment** + +Run: `head -20 sidequest-ui/src/types/watcher.ts` +Expected: a header comment mentioning `sidequest-server/src/lib.rs` (Rust-era reference). + +- [ ] **Step 2: Replace the source pointer** + +Replace the Rust-pointing comment with: + +```typescript +// Mirrors the Python WatcherEvent contract emitted by: +// - sidequest-server/sidequest/telemetry/spans.py (SPAN_ROUTES) +// - sidequest-server/sidequest/server/watcher.py (WatcherSpanProcessor) +// - sidequest-server/sidequest/telemetry/validator.py (Layer-3 events) +// See ADR-031 (Game Watcher) and ADR-089 (Dashboard Restoration). +``` + +- [ ] **Step 3: No-op (docs only)** + +Run: `cd sidequest-ui && npx tsc --noEmit` (or `just client-lint`) +Expected: PASS — comment changes don't affect types. + +- [ ] **Step 4: No tests needed** + +Pure docstring change. + +- [ ] **Step 5: Commit** + +```bash +cd sidequest-ui +git add src/types/watcher.ts +git commit -m "docs(watcher): point header at Python source modules" +``` + +--- + +### Task 23: Author ADR-089 + +**Files:** +- Create: `docs/adr/089-otel-dashboard-restoration.md` + +- [ ] **Step 1: Inspect the ADR frontmatter schema** + +Run: `head -30 docs/adr/088-adr-frontmatter-schema-and-auto-generated-indexes.md` +Expected: the canonical frontmatter block. Use it as a template. + +Also: `cat docs/adr/README.md | head -60` for the load-bearing-flag and category conventions. + +- [ ] **Step 2: Write the ADR** + +Create `docs/adr/089-otel-dashboard-restoration.md`: + +```markdown +--- +id: ADR-089 +title: OTEL Dashboard Restoration after Python Port +status: accepted +date: 2026-04-25 +authors: [architect] +related: [031, 058, 082] +supersedes: [] +implementation-status: live +load-bearing: true +categories: [Telemetry, Project Lifecycle] +--- + +# ADR-089: OTEL Dashboard Restoration after Python Port + +## Status + +**Accepted** — 2026-04-25. + +## Context + +After the Rust → Python port (ADR-082), the OTEL dashboard at `/ws/watcher` +and the React `Dashboard/` panes degraded materially. The CLAUDE.md +"OTEL Observability Principle" was no longer enforced: the GM panel — the +"lie detector" Sebastien-the-mechanics-first-player and Keith-the-builder +both depend on — surfaced almost no live signal. + +A forensic audit found four failures: + +1. The `just otel` recipe pointed at a deleted `playtest.py`. +2. Most `WatcherEventType` values declared in `watcher.ts` had zero or one + emission sites in production code. +3. `~80%` of `SPAN_*` constants in `telemetry/spans.py` were transcribed + from Rust but never re-implanted into Python dispatch — the catalog + was aspirational. +4. The translator (`WatcherSpanProcessor.on_end`) flattened every span + to `agent_span_close` with no semantic typed-event routing. + +The Python port copied the **vocabulary** and **transport** but not the +**emission discipline** or the **Layer-3 narrative validator**. + +## Decision + +Restore the dashboard to ADR-031's three-layer semantic-telemetry contract, +faithfully ported to Python, with three deliberate departures from the +Rust ADR: + +1. **`TurnRecord` shape.** Store `snapshot_before_hash + snapshot_after + + StateDelta` rather than two full `GameSnapshot` clones. Same + validation power, no double-clone cost. +2. **Validator transport.** `asyncio.Queue(maxsize=32)` with oldest-record + drop on backpressure (faithful to ADR-031's "lossy by design" intent). +3. **Console exporter gating.** `ConsoleSpanExporter` defaults off; gated + behind `SIDEQUEST_OTEL_CONSOLE=1` for debug. + +The translator gains a routing table (`SPAN_ROUTES`) colocated with span +constants in `spans.py` so renaming a constant breaks the route at import +and a new constant without a routing decision trips the +`test_routing_completeness.py` lint. + +A new `Validator` task consumes `TurnRecord`s and runs five deterministic +checks: entity, inventory, patch-legality, trope-alignment, +subsystem-exercise. The validator owns `turn_complete`, `coverage_gap`, +and `validation_warning`. + +## Consequences + +### Positive + +- Every `WatcherEventType` declared in `watcher.ts` has a clear owner; + no orphans, no double-emission. +- Adding a new span constant requires an explicit routing decision — + catches the regression that caused this work. +- The "lie detector" property is restored: subsystem activity surfaces + on the dashboard whether or not the LLM mentions it. +- `just otel` is CI-protected against future script renames. + +### Negative + +- `~80` emission sites still need re-implanting (Phase 2 follow-up plans). + The infrastructure now in place makes each rollout a small, repeatable + change rather than a system-wide redesign. +- Validator runs on the same event loop as dispatch. Bounded queue + lossy + drop policy keeps it from impacting hot-path latency, but heavy check + overhead would still serialize behind dispatch. Acceptable for current + playtest scale (≤5 watchers, ≤1 turn/sec). + +### Out of scope + +- No `TurnRecord` persistence / replay — ADR-031 mentions it as a future + possibility; not building now. +- No second-LLM validation. ADR-031's "God lifting rocks" prohibition + stands. +- No Pennyfarthing-style HTTP OTLP receiver. In-process span processor + remains. + +## Implementation + +See `docs/superpowers/specs/2026-04-25-otel-dashboard-restoration-design.md` +for the design and `docs/superpowers/plans/2026-04-25-otel-dashboard-restoration.md` +for the task plan. + +## Related + +- ADR-031: Game Watcher — Semantic Telemetry (this ADR ports it to Python) +- ADR-058: Claude subprocess OTEL passthrough (unchanged) +- ADR-082: Port `sidequest-api` from Rust back to Python (this ADR closes one of its drift items) +``` + +- [ ] **Step 3: Validate the ADR file** + +Run: `python3 scripts/regenerate_adr_indexes.py --check 2>&1 || python3 scripts/regenerate_adr_indexes.py` +Expected: regenerates indexes and includes ADR-089. If `--check` is unsupported, just regenerate. + +- [ ] **Step 4: Inspect the regenerated index** + +Run: `grep -n "089" docs/adr/README.md CLAUDE.md 2>&1 | head` +Expected: ADR-089 appears in both. + +- [ ] **Step 5: Commit** + +```bash +git add docs/adr/089-otel-dashboard-restoration.md docs/adr/README.md CLAUDE.md +git commit -m "docs(adr): ADR-089 OTEL Dashboard Restoration after Python Port" +``` + +--- + +### Task 24: Amend ADR-031 with the Python-port section + +**Files:** +- Modify: `docs/adr/031-game-watcher-semantic-telemetry.md` + +- [ ] **Step 1: Inspect the current ADR-031** + +Run: `grep -n "## " docs/adr/031-game-watcher-semantic-telemetry.md` +Expected: list of section headers. Identify where to insert a "Python-port" subsection. + +- [ ] **Step 2: Append the port section and update status** + +At the end of `docs/adr/031-game-watcher-semantic-telemetry.md`, before the trailing line if any, append: + +```markdown +--- + +## Python-port note (2026-04-25) + +After ADR-082 ported the backend from Rust to Python, the canonical +implementation lives in: + +- `sidequest-server/sidequest/telemetry/spans.py` — span name catalog, + `SpanRoute` mechanism, `SPAN_ROUTES`, `FLAT_ONLY_SPANS`, helper + context managers. +- `sidequest-server/sidequest/server/watcher.py` — `WatcherSpanProcessor` + translator (Layer 1 + typed-event routing). +- `sidequest-server/sidequest/telemetry/validator.py` — Layer-3 narrative + validator (`Validator` class, five checks). +- `sidequest-server/sidequest/telemetry/turn_record.py` — `TurnRecord` + dataclass. + +Code references in this ADR pre-2026-04-19 point at the Rust tree archived +at https://github.com/slabgorb/sidequest-api. The Rust phasing table is +preserved as historical context but the active phase descriptions are +superseded by ADR-089. + +`implementation-status: live` is re-affirmed for the Python port as of +ADR-089's completion. +``` + +If the existing ADR-031 frontmatter has `implementation-status: drift` or `partial`, flip it back to `live`: + +```yaml +implementation-status: live +``` + +- [ ] **Step 3: Regenerate the ADR index again** + +Run: `python3 scripts/regenerate_adr_indexes.py` +Expected: re-emits indexes; ADR-031 should no longer carry the drift marker. + +- [ ] **Step 4: Verify CLAUDE.md ADR Index** + +Run: `grep -A2 "031.*Game Watcher" CLAUDE.md` +Expected: line no longer ends in `*(drift)*`. + +- [ ] **Step 5: Commit** + +```bash +git add docs/adr/031-game-watcher-semantic-telemetry.md docs/adr/README.md CLAUDE.md docs/adr/DRIFT.md +git commit -m "docs(adr): amend ADR-031 with Python-port section, flip status to live" +``` + +--- + +### Task 25: Final aggregate gate + +**Files:** +- (no source changes — final verification step) + +- [ ] **Step 1: Run the full check-all gate** + +Run: `just check-all` +Expected: PASS — server lint, server tests, client lint, client tests, daemon lint all green. + +- [ ] **Step 2: Run the routing-completeness lint explicitly** + +Run: `cd sidequest-server && uv run pytest tests/telemetry/test_routing_completeness.py -v` +Expected: PASS. + +- [ ] **Step 3: Boot the dashboard end-to-end** + +Run: `just up` in one terminal, then in another: `just otel` and confirm the browser-friendly viewer at `http://localhost:9765` loads without errors. Drive a single turn through the running game and confirm the dashboard shows: + +- `agent_span_open` (handshake) +- `agent_span_close` flow (Timeline tab) +- `turn_complete` (validator emission) +- `state_transition` events for currently-routed spans +- `subsystem_exercise_summary` per turn + +If any expected event is missing, work back through Tasks 5–21 to identify the broken link. + +- [ ] **Step 4: Tear down cleanly** + +Run: `just down` +Expected: services stop without errors; no orphaned validator tasks per the shutdown logs. + +- [ ] **Step 5: Final commit (if any sweep changes were needed)** + +If Step 3 surfaced any small routing gaps, fix them and commit: + +```bash +cd sidequest-server +git add -p # stage only the routing fix +git commit -m "fix(telemetry): close routing gap surfaced by end-to-end smoke" +``` + +If no fixes needed, skip this step. + +--- + +## Self-Review + +**1. Spec coverage:** + +| Spec section | Tasks | +|---|---| +| §1.1 broken `just otel` | Task 1 | +| §1.2 dashboard contract stubs | Tasks 5, 6, 8, 11–17 | +| §1.3 ~80% dead spans (turn root) | Tasks 8, 9 | +| §1.4 impoverished translator | Tasks 4, 5, 6 | +| §2.1 dep-1: TurnRecord shape | Task 10 | +| §2.1 dep-2: validator queue | Task 11 | +| §2.1 dep-3: console exporter gating | Task 2 | +| §3 architecture diagram | Tasks 4–21 (all three layers) | +| §4 family table — turn_root | Tasks 8, 9 | +| §4 family table — non-turn families | **Out of scope — follow-up plans** | +| §4.2 implementation conventions | Tasks 5, 8 (helper-first; required attrs) | +| §5.1 TurnRecord dataclass | Task 10 | +| §5.2 pipeline | Tasks 11, 20, 21 | +| §5.3 five checks | Tasks 12–17 | +| §5.4 health & self-observation | Task 18 | +| §6.1–6.4 translator routing | Tasks 4, 5, 6 | +| §6.5 severity inference | Task 6 | +| §6.6 publish_event migration | **Deferred to follow-up plans (per-family)** | +| §6.7 ownership matrix | Tasks 4–21 (each event type owned somewhere) | +| §7.1 three test layers | Tasks 8, 9, 11–21 | +| §7.2 routing completeness lint | Task 7 | +| §7.3 validator pipeline tests | Tasks 11, 19 | +| §7.4 P0 smoke test | Task 3 | +| §8 sequencing | Phase 0/1/3/4 in scope; Phase 2 family rollouts deferred | +| §9 deliverables | Tasks 1, 2, 7, 10, 11, 20, 22, 23, 24 | +| §10 ADR linkage | Tasks 23, 24 | + +Phase 2's ~24 non-turn emission families are the only deliberate gap, called out explicitly in the plan header. + +**2. Placeholder scan:** No "TBD"/"TODO"/"fill in"/"similar to Task N" — every code step has actual code. Every command step has the literal command. Field names in `TurnRecord` (Task 10) are reused unchanged in Tasks 11–21. + +**3. Type consistency:** `Validator.submit`/`is_running`/`shutdown` referenced consistently. `TurnRecord` field names match across producers (Task 21) and consumers (Tasks 12–17). `SpanRoute` `event_type`/`component`/`extract` fields stable across Tasks 4–6. + +--- + +## Execution Handoff + +Plan complete and saved to `docs/superpowers/plans/2026-04-25-otel-dashboard-restoration.md`. Two execution options: + +**1. Subagent-Driven (recommended)** — I dispatch a fresh subagent per task, review between tasks, fast iteration. Best for a 25-task plan because reviewer eyes every commit and the main session stays uncluttered. + +**2. Inline Execution** — Execute tasks in this session using executing-plans, batch execution with checkpoints for review. + +Which approach, Dear Viewer?