Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 38 additions & 0 deletions .github/workflows/just-otel-smoke.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
name: just otel smoke

on:
pull_request:
paths:
- "justfile"
- "scripts/playtest_dashboard.py"
- ".github/workflows/just-otel-smoke.yml"
push:
branches: [main]

jobs:
just-otel:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
submodules: recursive

- name: Install just
run: |
curl --proto '=https' --tlsv1.2 -sSf https://just.systems/install.sh \
| bash -s -- --to /usr/local/bin

- name: Install uv
run: curl -LsSf https://astral.sh/uv/install.sh | sh && echo "$HOME/.cargo/bin" >> $GITHUB_PATH

- name: Install Python
uses: actions/setup-python@v5
with:
python-version: "3.12"

- name: Sync orchestrator deps
run: uv sync

- name: Smoke-test `just otel`
run: timeout 5 just otel || [ $? -eq 124 ]
# exit 124 = timeout fired = recipe started and is listening
3 changes: 3 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -273,6 +273,9 @@ Rust code samples in pre-ADR-082 ADRs are historical; translation table in
**Code Generation / Tooling (059, 069)**
- **059 Monster Manual — Server-Side Pre-Generation via Game-State Injection** *(drift)* · 069 Scenario Fixtures — Pre-configured World States for Testing *(drift)*

**Observability (090)**
- 090 OTEL Dashboard Restoration after Python Port

**Codebase Decomposition (060, 061, 062, 063, 064, 068, 088)**
- 060 Genre Models Decomposition — Split models.rs by Domain · 061 Lore Module Decomposition — Split lore.rs by Responsibility · 062 Server lib.rs Extraction — Route Groups, State, and Watcher Events · 063 Dispatch Handler Splitting — By Pipeline Stage · 064 Game Crate Domain Modules — Organize 69 Flat Files · 068 Magic Literal Extraction — Domain-Scoped Constants · **088 ADR Frontmatter Schema and Auto-Generated Indexes**

Expand Down
32 changes: 30 additions & 2 deletions docs/adr/031-game-watcher-semantic-telemetry.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,14 +7,16 @@ deciders: [Keith Avery]
supersedes: []
superseded-by: null
related: []
tags: [genre-mechanics]
tags: [genre-mechanics, observability]
implementation-status: live
implementation-pointer: null
---

# ADR-031: Game Watcher — Semantic Telemetry for AI Agent Observability

> New for Rust port. No Python equivalent — sq-2 uses ad-hoc logging.
> Originally specified for the Rust backend. Python port (ADR-082) preserved
> the architecture; ADR-090 documents the post-port restoration. This ADR's
> prose remains the canonical statement of the three-layer model.

## Context
SideQuest has an LLM adjudicating an RPG. Unlike a deterministic game engine, Claude makes
Expand Down Expand Up @@ -185,3 +187,29 @@ game messages. Events are JSON-serialized `tracing` output filtered to game-rele
- ADR-018: Trope engine lifecycle
- ADR-026: Client state mirror
- ADR-027: Reactive state messaging

---

## Python-port note (2026-04-25)

After ADR-082 ported the backend from Rust to Python, the canonical
implementation lives in:

- `sidequest-server/sidequest/telemetry/spans.py` — span name catalog,
`SpanRoute` mechanism, `SPAN_ROUTES`, `FLAT_ONLY_SPANS`, helper
context managers.
- `sidequest-server/sidequest/server/watcher.py` — `WatcherSpanProcessor`
translator (Layer 1 + typed-event routing).
- `sidequest-server/sidequest/telemetry/validator.py` — Layer-3 narrative
validator (`Validator` class, five checks: entity, inventory,
patch-legality, trope-alignment, subsystem-exercise).
- `sidequest-server/sidequest/telemetry/turn_record.py` — `TurnRecord`
dataclass (per-turn audit record submitted to the validator queue).

Code references in this ADR pre-2026-04-19 point at the Rust tree archived
at https://github.com/slabgorb/sidequest-api. The Rust phasing table is
preserved as historical context but the active phase descriptions are
superseded by ADR-090.

`implementation-status: live` is re-affirmed for the Python port as of
ADR-090's completion.
104 changes: 104 additions & 0 deletions docs/adr/090-otel-dashboard-restoration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
---
id: 90
title: "OTEL Dashboard Restoration after Python Port"
status: accepted
date: 2026-04-25
deciders: ["Keith Avery"]
supersedes: []
superseded-by: null
related: [31, 58, 82]
tags: [observability, project-lifecycle]
implementation-status: live
implementation-pointer: null
---

# ADR-090: OTEL Dashboard Restoration after Python Port

## Status

**Accepted** — 2026-04-25.

## Context

After the Rust → Python port (ADR-082), the OTEL dashboard at `/ws/watcher`
and the React `Dashboard/` panes degraded materially. The CLAUDE.md
"OTEL Observability Principle" was no longer enforced: the GM panel — the
"lie detector" Sebastien-the-mechanics-first-player and Keith-the-builder
both depend on — surfaced almost no live signal.

A forensic audit found four failures:

1. The `just otel` recipe pointed at a deleted `playtest.py`.
2. Most `WatcherEventType` values declared in `watcher.ts` had zero or one
emission sites in production code.
3. ~80% of `SPAN_*` constants in `telemetry/spans.py` were transcribed from
Rust but never re-implanted into Python dispatch — the catalog was
aspirational.
4. The translator (`WatcherSpanProcessor.on_end`) flattened every span to
`agent_span_close` with no semantic typed-event routing.

The Python port copied the **vocabulary** and **transport** but not the
**emission discipline** or the **Layer-3 narrative validator**.

## Decision

Restore the dashboard to ADR-031's three-layer semantic-telemetry contract,
faithfully ported to Python, with three deliberate departures:

1. **`TurnRecord` shape.** Store `snapshot_before_hash + snapshot_after +
StateDelta` rather than two full `GameSnapshot` clones. Same validation
power, no double-clone cost.
2. **Validator transport.** `asyncio.Queue(maxsize=32)` with oldest-record
drop on backpressure (faithful to ADR-031's "lossy by design" intent).
3. **Console exporter gating.** `ConsoleSpanExporter` defaults off; gated
behind `SIDEQUEST_OTEL_CONSOLE=1` for debug.

The translator gains a routing table (`SPAN_ROUTES`) colocated with span
constants in `spans.py` so renaming a constant breaks the route at import
and a new constant without a routing decision trips the
`test_routing_completeness.py` lint.

A new `Validator` task consumes `TurnRecord`s and runs five deterministic
checks: entity, inventory, patch-legality, trope-alignment,
subsystem-exercise. The validator owns `turn_complete`, `coverage_gap`,
and `validation_warning`.

## Consequences

### Positive

- Every `WatcherEventType` declared in `watcher.ts` has a clear owner;
no orphans, no double-emission.
- Adding a new span constant requires an explicit routing decision —
catches the regression that caused this work.
- The "lie detector" property is restored: subsystem activity surfaces
on the dashboard whether or not the LLM mentions it.
- `just otel` is CI-protected against future script renames.

### Negative

- ~24 emission families still need re-implanting (Phase 2 follow-up
plans, one per family). The infrastructure now in place makes each
rollout a small, repeatable change.
- Validator runs on the same event loop as dispatch. Bounded queue +
lossy drop policy keeps it from impacting hot-path latency, but heavy
check overhead would still serialize behind dispatch. Acceptable for
current playtest scale (≤5 watchers, ≤1 turn/sec).

### Out of scope

- No `TurnRecord` persistence / replay.
- No second-LLM validation (ADR-031's "God lifting rocks" prohibition).
- No HTTP OTLP receiver. In-process span processor remains.

## Implementation

See `docs/superpowers/specs/2026-04-25-otel-dashboard-restoration-design.md`
for the design and `docs/superpowers/plans/2026-04-25-otel-dashboard-restoration.md`
for the task plan.

## Related

- ADR-031: Game Watcher — Semantic Telemetry (this ADR ports it to Python)
- ADR-058: Claude subprocess OTEL passthrough (unchanged)
- ADR-082: Port `sidequest-api` from Rust back to Python (this ADR closes one of its drift items)
1 change: 1 addition & 0 deletions docs/adr/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -193,6 +193,7 @@ Current backend reference documents: `docs/architecture.md`, `docs/tech-stack.md
| ADR | Status | Impl |
|-----|--------|------|
| [ADR-058: Claude Subprocess OTEL Passthrough](058-claude-subprocess-otel-passthrough.md) | ◇ proposed | deferred |
| [ADR-090: OTEL Dashboard Restoration after Python Port](090-otel-dashboard-restoration.md) | ✓ accepted | live |

## Codebase Decomposition

Expand Down
Loading
Loading