Contract reset: fix state/session, retire router, align docs (v2.1.1)#25
Contract reset: fix state/session, retire router, align docs (v2.1.1)#25
Conversation
Resolves the Option A state-model contract reset from the audit.
- Add session_id parameter to state_write/state_read/state_clear.
- Legacy shapes preserved: mode-only reads still hit the default empty-
session row; no-mode reads still return the legacy {modes: [...]} list
of empty-session rows. Discovery of per-session rows requires the new
additive list=true flag.
- Schema migration v5 normalizes NULL session_id rows to ''.
- Schema migration v6 rebuilds the state table with composite
PRIMARY KEY(mode, session_id), so two sessions can hold rows for the
same mode without colliding.
- Bump SERVER_VERSION to 2.1.1, SCHEMA_VERSION to 6.
- New tests cover: round-trip with/without session_id, session isolation,
scoped clear, listing, v4 → v6 migration with NULL normalization, and
concurrent-write correctness under the composite PK.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…-1, P0-3) Commit 2 of contract reset: propagate the session-aware state contract through every in-repo writer and the authoritative state docs. - scripts/omni_team.py _mcp_write_best_effort: drop the broken `ON CONFLICT(mode, session_id)` + legacy fallback double-path. Use single UPSERT matching the v6 plain composite PK. - scripts/subagent.py _mcp_write_best_effort: same treatment — removes the fallback branch that tried `ON CONFLICT(mode)` on a table that now has a composite PK. - skills/ralplan/SKILL.md: the three inline SQLite recipes now UPSERT on (mode, session_id) instead of (mode). The try/except fallback is dropped because the schema is deterministic after migration v6. - tests/test_cli.py: fixture builds the state table with the v6 schema (composite PK, NOT NULL DEFAULT ''). The former "router" fixture row is renamed "ralph" in step with the router-as-product-claim removal (covered fully in the next commit). - tests/test_mcp_pool.py: concurrency test inserts through the new composite conflict target. - docs/STATE_CONTRACT.md: rewritten to document schema v6, the tool contract, the back-compat guarantees for external MCP consumers, the session-scoping discipline, and the cancel contract. - docs/STATE_MODES.md: adds a session_scope column, drops the router row (router scripts are already removed on main; docs follow). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The router has been gone from the runtime for a while — scripts/router.py and scripts/router_state.py are already absent on main, so flagship skills that still invoked router_state.py were calling dead paths. This commit finishes the retirement by removing the remaining docs, skill preambles, ADR references, and the hook-side enforcement dead code. Code: - hooks/pre_tool_use.py: drop the _router_enforce_deny branch, the _ROUTER_EXEMPT_TOOLS set, and the call site in main(). Nothing writes .omni/cache/router-last.json any more, so the branch was unreachable. Skills (router preambles removed, resume hook delegated to MCP state): - skills/autopilot/SKILL.md - skills/ralph/SKILL.md - skills/ralplan/SKILL.md - skills/ultrawork/SKILL.md - skills/ultraqa/SKILL.md Docs: - README.md: front-door router framing replaced with explicit /deep-interview usage; docs index row dropped. - AGENTS.md: router bullet scrubbed from skill catalog blurb, audit metrics note, and further-reading list. - docs/ARCHITECTURE.md: scripts/router.py section removed. - docs/QUICKSTART.md: router example replaced with /deep-interview + a skip-interview example that doesn't reference the deleted file; doctor expectation corrected. - docs/README.md: ROUTER.md + MODELS.md entries dropped (targets are absent). - docs/MIGRATION.md: router cleanup sentence rewritten; parallel(claude| codex|gemini) scoring removal language left out as it was tied to the now-missing router.py. - docs/TEST_STRATEGY.md: router test invocation replaced with state/MCP tests; integration-marker blurb updated. - docs/ENV.md: OMNI_ROUTER_ENFORCE + OMNI_ROUTER_TTL_S rows deleted. - docs/ADR/README.md: ADR-0005 marked Archived; summary rewritten to explain the retirement; ADR-0011 no longer cites docs/ROUTER.md. - docs/ADR/ADR-0005-router-scoring-rubric.md → docs/ADR/archive/... (git mv so history follows). Tests: - tests/test_run_mutation.py: comment about the v2.1.0 removal now cites the contract-reset PR rather than implying Claude-Code-only context. CHANGELOG: v2.1.1 Unreleased section added covering the removal. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tate (P0-3)
Closes every remaining reference to a non-shipped script or a "planned"
follow-on skill that the flagship flows still pointed at.
agents/planner.md: drops three `/copilot-omni:start-work` handoffs (the
skill never existed). Plans now hand off to the executor agent via
scripts/subagent.py, or to the team skill for parallel work. The
"planned / not yet implemented" hedge is removed.
skills/plan/SKILL.md: the ralplan deactivation step no longer shells out
to scripts/state_write.py (missing). Uses the shipped MCP state_write
tool with session_id.
skills/deep-interview/SKILL.md: the resume path no longer claims state is
stored at .omni/state/deep-interview-state.json (no script produces that
file). Resume now reads state via state_read(mode="deep-interview",
session_id=$OMNI_SESSION_ID), matching the state_write the skill already
makes. The init call is also annotated with session_id to match the
shipped contract.
skills/cancel/SKILL.md: the big one. Every reference to the orphan
.omni/state/sessions/{sessionId}/... tree and individual *-state.json
files is replaced with the two layers the runtime actually ships:
1. Process signal — scripts/cancel_signal.py writes
.omni/runs/<run-id>/cancel.signal (preserved).
2. State invalidation — state_clear(mode=..., session_id=...) via the
MCP tool, with state_read(list=true) for discovery.
Impacted sections: force-clear sequence, auto-detect logic, deferred-
tool bash fallback block, and the artifact-preservation note. Net ~55
lines removed. Frontmatter, flow chart, and messages reference untouched.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ression test (P0-4)
The pre-contract-reset audit caught docs/HOOK_CONTRACT.md and AGENTS.md
claiming "preToolUse was removed in v2.1.0 — not emitted by Copilot CLI"
while hooks/hooks.json still registered preToolUse and the repo's own
tests asserted it fires. This commit makes the docs truthful and adds a
regression test so it can't drift again.
docs/HOOK_CONTRACT.md: rewritten.
- sessionStart and preToolUse documented as LIVE with the fail-open +
version-gated stance made explicit. If a Copilot CLI version stops
emitting an event, the hook simply never runs; the plugin never
requires the event to fire.
- postToolUse / userPromptSubmitted / errorOccurred / sessionEnd marked
"no handler shipped" — not "not emitted".
- sessionStart output block matches what _compute_banner actually emits:
`v{ver} | N skills | N agents | pool=N` (no phantom "commands" or
"router=on/off" segments).
- Kill-switch table adds OMNI_SKIP_PRE_TOOL_USE (confirmed honored at
hooks/pre_tool_use.py:54).
- Sections 4-5 (audit/metrics) trimmed to what the live hooks actually
emit; router_decision / skill_trigger_matched metrics dropped because
no code writes them.
AGENTS.md: hook-contract section rewritten to describe both live hooks
and the "no handler shipped" posture for the unused events. Kill-switch
list updated.
docs/ARCHITECTURE.md:
- Diagram shows both hook scripts.
- Skill count corrected (27 → 28), tool count corrected (28 → 30).
- State table description matches schema v6 composite PK.
- Hook section rewritten.
tests/test_hook_contract_alignment.py (new):
- Parses hooks/hooks.json and HOOK_CONTRACT.md; asserts every shipped
event is documented LIVE and vice versa.
- Asserts the banner template shape in the doc matches _compute_banner.
- Asserts every shipped hook script exists on disk.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The integration_local harness installs @github/copilot globally via npm and expects a live Copilot CLI login on the machine. That is exactly the wrong default for contributors and CI — pytest -q on a clean box would try to mutate global npm state and fail on hosts without auth. - pytest.ini: add `-m "not integration_local"` to addopts. The marker is still registered, so `pytest -m integration_local` still works for the author running the local smoke deliberately. - docs/TEST_STRATEGY.md: document the marker row + the opt-in command. Confirmed: `pytest --collect-only -q` reports "384/385 tests collected (1 deselected)" on the current tree. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Audit flagged drift between plugin.json (2.1.0), marketplace.json (2.1.0), mcp/server.py (1.0.0), and scripts/omni.py (1.0.0), plus inconsistent skill/tool counts between README, AGENTS, CHANGELOG, INSTALL, QUICKSTART, and marketplace descriptions. Canonical numbers of record (verified from repo state): version: 2.1.1 skills: 27 (find skills -name SKILL.md | wc -l) agents: 19 (ls agents/*.md | wc -l) mcp tools: 30 (grep '^def _tool_' mcp/server.py | wc -l) Files updated to those numbers: - scripts/omni.py VERSION bumped 1.0.0 -> 2.1.1. - plugin.json, marketplace.json: 2.1.0 -> 2.1.1; marketplace.json description corrected to "27 skills, 19 agents, 30 MCP tools". - .github/copilot-instructions.md: "28 MCP tools" -> "30 MCP tools". - CHANGELOG.md entry in the v2.1.0 Copilot-CLI cleanup section: "28 MCP tools across ... ast-grep." -> "30 MCP tools across ..., codebase." (the repo gained the codebase family and the list fell behind.) - docs/INSTALL.md: corrects the stale "commands: 10, mcp tools: 20" doctor expectation — doctor never printed commands, and the tool count is 30, not 20. - docs/QUICKSTART.md: same correction, plus fixes my prior accidental "28 skills" edit back to the real 27. - docs/MIGRATION.md: migration-matrix row now lists 30 MCP tools next to the 19 agents / 27 skills pair. doctor now advertises the real MCP tool count via a small _count_mcp_tools helper (static scan of mcp/server.py for `def _tool_<name>(`). That is the signal QUICKSTART + INSTALL now promise. This closes the "doctor output vs docs" sub-part of the audit gap early. tests/test_cli.py: version assertion follows the bumped VERSION. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Audit §6 flagged README calling the codebase graph "a real repository
knowledge graph" when the implementation is a stdlib adjacency walk
over Python defs, import statements, and Markdown links. Audit §7
flagged LSP tools advertised alongside the full MCP surface while
returning {"status": "stub"} on invocation.
README.md:
- "Inspect the repository knowledge graph" -> "Inspect the repository's
lightweight code graph". Copy explains what the graph IS (stdlib walk,
adjacency only), what it is NOT (persistent, semantic, multi-language),
and what the deferred surface looks like.
- Architecture diagram shows both hook scripts and points the banner
line at the real "banner + memory context" behaviour.
mcp/server.py:
- lsp_hover / lsp_goto_definition / lsp_find_references descriptions now
start with "EXPERIMENTAL / STUB" and explicitly state the stub return
shape so any skill or agent can see the status without running the
call. Tests unchanged (handlers already return the stub shape).
CHANGELOG.md:
- v2.1.1 Unreleased section gains a "Changed" note covering the memory +
graph + LSP repositioning, the feature-count realignment, and the
doctor MCP-tool-count addition. A "Removed" note lists the stale
docs/ROUTER.md and docs/MODELS.md link cleanup.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…form mode regex (P1-8)
Closes the audit gap that the old contract validator passed while the
repo still had 18 files referencing missing router_state.py. The old
regex only caught positional state_write("foo", ...) calls; the keyword
form state_write(mode="foo") slipped through and hid every audit-era
skill that wrote mode="deep-interview" outside the registered set.
scripts/verify_plugin_contract.py:
- _MODE_LITERAL_RE now catches state_write / state_read / state_clear /
_mcp_write_best_effort in both positional and keyword forms. Covers
both ' and " quotes.
- _DYNAMIC_MODE_PREFIX now also treats angle-bracketed placeholders
like `team.<slug>` as template strings (same as the `{var}` rule).
- check_mode_key_registry matches concrete literals against registry
placeholder patterns — so `team.worker-1` resolves to `team.<worker-slug>`
and is considered registered without duplicating every concrete row.
- Two new checks:
broken-script-refs — scan skills/ and agents/ for scripts/<file>.py
tokens; FAIL when the target is missing.
missing-doc-links — scan README / AGENTS / docs index + quickstart
+ install + architecture for docs/*.md links;
FAIL on dangling targets. Fragment anchors are
stripped before resolution.
Both new checks are wired into the CHECKS dict + argparse flags.
scripts/omni.py: matching runtime checks added to `omni doctor` so a
contributor sees missing scripts / broken doc links immediately, not
only when they run the full contract validator. Helpers are pure-stdlib
static scans over the file tree.
docs/STATE_MODES.md: registers the `skill-active` mode (surfaced by
cancel's stop-hook-silencing step) now that the broader regex can see it.
Validator state after this commit: 21 checks, all green. Mode scan now
catches keyword-form call sites that slipped through before.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Four-reviewer pass (security + architect + code-reviewer + codex
adversarial) found a mix of HIGH issues that blocked merge and
MEDIUM/NIT items. Must-fix items addressed here:
mcp/server.py — atomic v6 migration + ordering
- Wrap the v6 state-table rebuild (DROP INDEX, CREATE state_new,
INSERT...SELECT, DROP TABLE state, ALTER RENAME, CREATE INDEX) in
BEGIN/COMMIT so a crash between DROP TABLE and RENAME cannot lose
the state rows. code-reviewer HIGH.
- Update the "additive only" migration comment to document that v6 is
an intentional one-time structural rebuild (NITPICK).
- Explicit comment on the migration runner explaining how
executescript + schema_version updates compose: step SQL runs first,
the version bump only happens after the SQL succeeded.
scripts/omni_team.py + scripts/subagent.py — schema bootstrap
- Restore a pre-v6 fallback in _mcp_write_best_effort. Codex CRITICAL:
without this, a user upgrading from a pre-v6 DB sees the best-effort
writer fail silently until the MCP server happens to open the DB.
The fallback tries ON CONFLICT(mode, session_id) first (v6 shape),
then ON CONFLICT(mode) with session_id column (v2-v5), then the
columnless legacy shape (v1). Write path stays idempotent.
- Add explicit conn.commit() in subagent.py to match the omni_team.py
pattern (security MEDIUM).
scripts/omni.py — session-aware CLI
- `omni state list` now prints a Session column and uses
COALESCE(session_id,'') against the post-v6 schema, falling back on
a "no session_id column" branch for forward compat.
- `omni state show` accepts a new `--session-id` flag; unset = read the
empty-session row, matching MCP tool back-compat. codex HIGH.
- New _state_has_session_column helper so both commands work on
arbitrary schema ages.
skills/team/SKILL.md — contract alignment
- Rename the misleading "Router preamble" section to "Resume
preamble" and scrub the single remaining router-framed header
(codex MEDIUM).
- Fix four state_write / state_read examples that used the wrong
argument shape (`active=true, state={...}` instead of the real
`body={...}`) and omitted session_id. All examples now pass
session_id="$OMNI_SESSION_ID" and wrap the payload in body={} to
match the shipped MCP schema (codex HIGH).
tests/test_mcp_state_session.py + docs/STATE_CONTRACT.md — back-compat
- Extend test_legacy_modes_listing_shape_preserved to make the
deliberate per-session exclusion + list=true discovery path
explicit, so future edits cannot silently regress the back-compat
promise (architect required).
- Update the STATE_CONTRACT.md tool table to call out that the
legacy no-mode aggregate deliberately excludes per-session rows and
that list=true is the documented discovery path.
Deferred as low-risk polish (listed in review but not blocking):
- Regex double-substitution cleanup in verify_plugin_contract.py
check_mode_key_registry (currently works correctly via two no-ops
cancelling out — code-reviewer MEDIUM).
- Concurrent-writes test body-content validation (code-reviewer
MEDIUM; the current assertion validates COUNT(*) == 1 which is the
serialization invariant under test).
Verification: 383 pytest passed / 1 skipped; 21/21 validator checks
green; doctor prints all 9 status lines OK; `omni state list` now
renders the Session column cleanly.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codex final-pass review caught that my earlier comment claimed the schema_version bump and the v6 rebuild SQL run in the same transaction. They do not — executescript commits between statements, and the INSERT INTO schema_version is a separate autocommit statement right after. Fix the comment to describe what actually happens: the rebuild itself is atomic (wrapped in BEGIN/COMMIT), and a crash between the rebuild commit and the version bump simply re-runs the (idempotent) rebuild next startup. Pure comment change; no behaviour. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 55162d3e52
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| elif mode is not None: | ||
| cur = conn.execute("DELETE FROM state WHERE mode=?", (mode,)) |
There was a problem hiding this comment.
Scope mode-only state clears to default session
state_clear now supports per-session rows, but the mode-only branch deletes by mode without filtering session_id, so a legacy call like state_clear({"mode": "ralph"}) will remove every session’s row for that mode. This breaks session isolation in mixed clients (older callers + new session-scoped writers) and can silently wipe another active run’s state; mode-only should target the default session row (session_id='') unless the caller explicitly asks for broader deletion.
Useful? React with 👍 / 👎.
… P1) Follow-up to #25. Codex review flagged that `state_clear({"mode": "ralph"})` was deleting every session's row for that mode — a regression for back-compat because a legacy pre-v6 caller could accidentally wipe the state of an unrelated active session. Change: - Mode-only clear now targets the default empty-session row only (`session_id = ''`). - New opt-in combo: `state_clear({"mode": ..., "all": true})` purges every session's row for that mode. This is the deliberate broad-purge path; callers must ask for it explicitly. - Other combos unchanged: (mode + session_id) = one row, (session_id only) = all modes for that session, (all=true) = nuke all. Docs + schema description updated to match. Two new tests cover the back-compat scope (legacy call preserves per-session rows) and the opt-in broad purge. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Resolves the P0 correctness defects and P1 honesty defects from the 2026-04-18 product-contract audit. Eleven atomic commits; 384 tests / 21 validator checks green; codex + sonnet code-reviewer + sonnet architect + sonnet security-reviewer all passed the final diff.
Version bump: 2.1.0 → 2.1.1.
P0 — correctness + contract reset
state_write,state_read, andstate_clearMCP tools now acceptsession_id. The state table is rebuilt with a compositePRIMARY KEY(mode, session_id)so two sessions can hold rows for the same mode without colliding. Back-compat preserved: callers that never passsession_idsee the exact pre-v6 shape. Schema migrated v4 → v6; the destructive rebuild is wrapped inBEGIN/COMMITand runs under_MIGRATE_LOCKso a crash cannot lose rows.scripts/omni_team.pyandscripts/subagent.pynow carry a pre-v6 → v2 → v1 fallback ladder so best-effort writers work across upgrade states.scripts/router.pyandscripts/router_state.pywere already missing on main. This PR finishes the retirement: removes the preamble from 5 flagship skills (autopilot,ralph,ralplan,ultrawork,ultraqa), deletes the_router_enforce_denydead-code branch fromhooks/pre_tool_use.py, dropsOMNI_ROUTER_ENFORCE/OMNI_ROUTER_TTL_Senv vars, archives ADR-0005.agents/planner.mdno longer hands off to the non-existent/copilot-omni:start-work.skills/plan/SKILL.mdreplacesscripts/state_write.pywith the MCPstate_writetool.skills/deep-interview/SKILL.mdresume path now uses MCPstate_read.skills/cancel/SKILL.mdunified onto the two shipped layers:scripts/cancel_signal.py+ MCPstate_clear(session_id=...).docs/HOOK_CONTRACT.md,AGENTS.md, anddocs/ARCHITECTURE.mdnow document both live hooks (sessionStart,preToolUse) with fail-open + version-gated semantics. Newtests/test_hook_contract_alignment.pyasserts the shippedhooks/hooks.jsonand the doc cannot drift apart.pytest.iniexcludes the local-only Copilot CLI installer smoke by default (-m "not integration_local"); the marker is still invocable explicitly.P1 — product hardening
plugin.json,marketplace.json,mcp/server.py:SERVER_VERSION,scripts/omni.py:VERSION: v2.1.1. Feature counts realigned everywhere to the actual tree: 27 skills, 19 agents, 30 MCP tools.omni doctornow prints the MCP tool count alongside skills/agents.EXPERIMENTAL / STUBsince they return{"status": "stub"}.omni doctornow detects (a) missingscripts/<file>.pyreferences inside skills/agents, (b) brokendocs/<file>.mdlinks in headline docs. Contract validator gainsbroken-script-refs+missing-doc-linkschecks and extends the mode-literal regex to catch keyword-form calls.docs/ROUTER.mdanddocs/MODELS.mdreferences removed from README / AGENTS / docs index.Review trail
Adversarial review with codex CLI (
--dangerously-bypass-approvals-and-sandbox) before planning, after execution, and after remediation. Sonnetcode-reviewer,architect, andsecurity-revieweragents ran in parallel on the completed diff. All blocking findings addressed inda369ae:_migrateon upgrade (codex CRITICAL)omni state list/state shownot session-aware (codex HIGH)skills/team/SKILL.mdstale state_write examples + router preamble (codex HIGH)state_read({})now tested + documented (architect required)conn.commit()explicit insubagent.py(security MEDIUM)Test plan
pytest -q— 383 passed, 1 skipped (Windows-only), 1 deselected (integration_local, by design)python3 scripts/verify_plugin_contract.py --all— 21/21 checks OKpython3 scripts/omni.py doctor— all 9 status lines OK including newbroken refs: 0 OKandmissing docs: 0 OKWhat is NOT in this PR (documented follow-ups)
🤖 Generated with Claude Code