Contract reset: fix state/session, retire router, align docs (v2.1.1) by Jurel89 · Pull Request #25 · Jurel89/copilot-omni

Jurel89 · 2026-04-18T16:46:21Z

Summary

Resolves the P0 correctness defects and P1 honesty defects from the 2026-04-18 product-contract audit. Eleven atomic commits; 384 tests / 21 validator checks green; codex + sonnet code-reviewer + sonnet architect + sonnet security-reviewer all passed the final diff.

Version bump: 2.1.0 → 2.1.1.

P0 — correctness + contract reset

State / session API (P0-1) — state_write, state_read, and state_clear MCP tools now accept session_id. The state table is rebuilt with a composite PRIMARY KEY(mode, session_id) so two sessions can hold rows for the same mode without colliding. Back-compat preserved: callers that never pass session_id see the exact pre-v6 shape. Schema migrated v4 → v6; the destructive rebuild is wrapped in BEGIN/COMMIT and runs under _MIGRATE_LOCK so a crash cannot lose rows. scripts/omni_team.py and scripts/subagent.py now carry a pre-v6 → v2 → v1 fallback ladder so best-effort writers work across upgrade states.
Router retirement (P0-2) — scripts/router.py and scripts/router_state.py were already missing on main. This PR finishes the retirement: removes the preamble from 5 flagship skills (autopilot, ralph, ralplan, ultrawork, ultraqa), deletes the _router_enforce_deny dead-code branch from hooks/pre_tool_use.py, drops OMNI_ROUTER_ENFORCE / OMNI_ROUTER_TTL_S env vars, archives ADR-0005.
Missing-script references (P0-3) — agents/planner.md no longer hands off to the non-existent /copilot-omni:start-work. skills/plan/SKILL.md replaces scripts/state_write.py with the MCP state_write tool. skills/deep-interview/SKILL.md resume path now uses MCP state_read. skills/cancel/SKILL.md unified onto the two shipped layers: scripts/cancel_signal.py + MCP state_clear(session_id=...).
Hook contract truthful (P0-4) — docs/HOOK_CONTRACT.md, AGENTS.md, and docs/ARCHITECTURE.md now document both live hooks (sessionStart, preToolUse) with fail-open + version-gated semantics. New tests/test_hook_contract_alignment.py asserts the shipped hooks/hooks.json and the doc cannot drift apart.
Default pytest path (P0-5) — pytest.ini excludes the local-only Copilot CLI installer smoke by default (-m "not integration_local"); the marker is still invocable explicitly.

P1 — product hardening

Versions + counts aligned (P1-6) — one source of truth across plugin.json, marketplace.json, mcp/server.py:SERVER_VERSION, scripts/omni.py:VERSION: v2.1.1. Feature counts realigned everywhere to the actual tree: 27 skills, 19 agents, 30 MCP tools. omni doctor now prints the MCP tool count alongside skills/agents.
Honest positioning (P1-7) — README calls the codebase graph a "lightweight local code graph" instead of a "real knowledge graph". LSP tool descriptions marked EXPERIMENTAL / STUB since they return {"status": "stub"}.
Doctor + validator diagnostics (P1-8) — omni doctor now detects (a) missing scripts/<file>.py references inside skills/agents, (b) broken docs/<file>.md links in headline docs. Contract validator gains broken-script-refs + missing-doc-links checks and extends the mode-literal regex to catch keyword-form calls.
Stale doc links dropped (P1-9) — docs/ROUTER.md and docs/MODELS.md references removed from README / AGENTS / docs index.

Review trail

Adversarial review with codex CLI (--dangerously-bypass-approvals-and-sandbox) before planning, after execution, and after remediation. Sonnet code-reviewer, architect, and security-reviewer agents ran in parallel on the completed diff. All blocking findings addressed in da369ae:

v6 migration atomicity (code-reviewer HIGH)
Best-effort writers bypassing _migrate on upgrade (codex CRITICAL)
omni state list / state show not session-aware (codex HIGH)
skills/team/SKILL.md stale state_write examples + router preamble (codex HIGH)
Back-compat promise for state_read({}) now tested + documented (architect required)
conn.commit() explicit in subagent.py (security MEDIUM)

Test plan

pytest -q — 383 passed, 1 skipped (Windows-only), 1 deselected (integration_local, by design)
python3 scripts/verify_plugin_contract.py --all — 21/21 checks OK
python3 scripts/omni.py doctor — all 9 status lines OK including new broken refs: 0 OK and missing docs: 0 OK
Migration test covers v4 → v6 NULL normalization + idempotent re-run safety
Concurrency test asserts composite-PK upsert produces exactly one final row under 4-thread x 20-write load
Hook contract alignment test keeps docs + shipped hooks.json in sync

What is NOT in this PR (documented follow-ups)

Deep memory adaptation (ranking / preference learning)
Persistent cached code graph + multi-language support
Real LSP session lifecycle
README strategic rewrite around current Copilot CLI native capabilities (autopilot, /fleet, /plan, memory, session data)

🤖 Generated with Claude Code

Resolves the Option A state-model contract reset from the audit. - Add session_id parameter to state_write/state_read/state_clear. - Legacy shapes preserved: mode-only reads still hit the default empty- session row; no-mode reads still return the legacy {modes: [...]} list of empty-session rows. Discovery of per-session rows requires the new additive list=true flag. - Schema migration v5 normalizes NULL session_id rows to ''. - Schema migration v6 rebuilds the state table with composite PRIMARY KEY(mode, session_id), so two sessions can hold rows for the same mode without colliding. - Bump SERVER_VERSION to 2.1.1, SCHEMA_VERSION to 6. - New tests cover: round-trip with/without session_id, session isolation, scoped clear, listing, v4 → v6 migration with NULL normalization, and concurrent-write correctness under the composite PK. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…-1, P0-3) Commit 2 of contract reset: propagate the session-aware state contract through every in-repo writer and the authoritative state docs. - scripts/omni_team.py _mcp_write_best_effort: drop the broken `ON CONFLICT(mode, session_id)` + legacy fallback double-path. Use single UPSERT matching the v6 plain composite PK. - scripts/subagent.py _mcp_write_best_effort: same treatment — removes the fallback branch that tried `ON CONFLICT(mode)` on a table that now has a composite PK. - skills/ralplan/SKILL.md: the three inline SQLite recipes now UPSERT on (mode, session_id) instead of (mode). The try/except fallback is dropped because the schema is deterministic after migration v6. - tests/test_cli.py: fixture builds the state table with the v6 schema (composite PK, NOT NULL DEFAULT ''). The former "router" fixture row is renamed "ralph" in step with the router-as-product-claim removal (covered fully in the next commit). - tests/test_mcp_pool.py: concurrency test inserts through the new composite conflict target. - docs/STATE_CONTRACT.md: rewritten to document schema v6, the tool contract, the back-compat guarantees for external MCP consumers, the session-scoping discipline, and the cancel contract. - docs/STATE_MODES.md: adds a session_scope column, drops the router row (router scripts are already removed on main; docs follow). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The router has been gone from the runtime for a while — scripts/router.py and scripts/router_state.py are already absent on main, so flagship skills that still invoked router_state.py were calling dead paths. This commit finishes the retirement by removing the remaining docs, skill preambles, ADR references, and the hook-side enforcement dead code. Code: - hooks/pre_tool_use.py: drop the _router_enforce_deny branch, the _ROUTER_EXEMPT_TOOLS set, and the call site in main(). Nothing writes .omni/cache/router-last.json any more, so the branch was unreachable. Skills (router preambles removed, resume hook delegated to MCP state): - skills/autopilot/SKILL.md - skills/ralph/SKILL.md - skills/ralplan/SKILL.md - skills/ultrawork/SKILL.md - skills/ultraqa/SKILL.md Docs: - README.md: front-door router framing replaced with explicit /deep-interview usage; docs index row dropped. - AGENTS.md: router bullet scrubbed from skill catalog blurb, audit metrics note, and further-reading list. - docs/ARCHITECTURE.md: scripts/router.py section removed. - docs/QUICKSTART.md: router example replaced with /deep-interview + a skip-interview example that doesn't reference the deleted file; doctor expectation corrected. - docs/README.md: ROUTER.md + MODELS.md entries dropped (targets are absent). - docs/MIGRATION.md: router cleanup sentence rewritten; parallel(claude| codex|gemini) scoring removal language left out as it was tied to the now-missing router.py. - docs/TEST_STRATEGY.md: router test invocation replaced with state/MCP tests; integration-marker blurb updated. - docs/ENV.md: OMNI_ROUTER_ENFORCE + OMNI_ROUTER_TTL_S rows deleted. - docs/ADR/README.md: ADR-0005 marked Archived; summary rewritten to explain the retirement; ADR-0011 no longer cites docs/ROUTER.md. - docs/ADR/ADR-0005-router-scoring-rubric.md → docs/ADR/archive/... (git mv so history follows). Tests: - tests/test_run_mutation.py: comment about the v2.1.0 removal now cites the contract-reset PR rather than implying Claude-Code-only context. CHANGELOG: v2.1.1 Unreleased section added covering the removal. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…tate (P0-3) Closes every remaining reference to a non-shipped script or a "planned" follow-on skill that the flagship flows still pointed at. agents/planner.md: drops three `/copilot-omni:start-work` handoffs (the skill never existed). Plans now hand off to the executor agent via scripts/subagent.py, or to the team skill for parallel work. The "planned / not yet implemented" hedge is removed. skills/plan/SKILL.md: the ralplan deactivation step no longer shells out to scripts/state_write.py (missing). Uses the shipped MCP state_write tool with session_id. skills/deep-interview/SKILL.md: the resume path no longer claims state is stored at .omni/state/deep-interview-state.json (no script produces that file). Resume now reads state via state_read(mode="deep-interview", session_id=$OMNI_SESSION_ID), matching the state_write the skill already makes. The init call is also annotated with session_id to match the shipped contract. skills/cancel/SKILL.md: the big one. Every reference to the orphan .omni/state/sessions/{sessionId}/... tree and individual *-state.json files is replaced with the two layers the runtime actually ships: 1. Process signal — scripts/cancel_signal.py writes .omni/runs/<run-id>/cancel.signal (preserved). 2. State invalidation — state_clear(mode=..., session_id=...) via the MCP tool, with state_read(list=true) for discovery. Impacted sections: force-clear sequence, auto-detect logic, deferred- tool bash fallback block, and the artifact-preservation note. Net ~55 lines removed. Frontmatter, flow chart, and messages reference untouched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ression test (P0-4) The pre-contract-reset audit caught docs/HOOK_CONTRACT.md and AGENTS.md claiming "preToolUse was removed in v2.1.0 — not emitted by Copilot CLI" while hooks/hooks.json still registered preToolUse and the repo's own tests asserted it fires. This commit makes the docs truthful and adds a regression test so it can't drift again. docs/HOOK_CONTRACT.md: rewritten. - sessionStart and preToolUse documented as LIVE with the fail-open + version-gated stance made explicit. If a Copilot CLI version stops emitting an event, the hook simply never runs; the plugin never requires the event to fire. - postToolUse / userPromptSubmitted / errorOccurred / sessionEnd marked "no handler shipped" — not "not emitted". - sessionStart output block matches what _compute_banner actually emits: `v{ver} | N skills | N agents | pool=N` (no phantom "commands" or "router=on/off" segments). - Kill-switch table adds OMNI_SKIP_PRE_TOOL_USE (confirmed honored at hooks/pre_tool_use.py:54). - Sections 4-5 (audit/metrics) trimmed to what the live hooks actually emit; router_decision / skill_trigger_matched metrics dropped because no code writes them. AGENTS.md: hook-contract section rewritten to describe both live hooks and the "no handler shipped" posture for the unused events. Kill-switch list updated. docs/ARCHITECTURE.md: - Diagram shows both hook scripts. - Skill count corrected (27 → 28), tool count corrected (28 → 30). - State table description matches schema v6 composite PK. - Hook section rewritten. tests/test_hook_contract_alignment.py (new): - Parses hooks/hooks.json and HOOK_CONTRACT.md; asserts every shipped event is documented LIVE and vice versa. - Asserts the banner template shape in the doc matches _compute_banner. - Asserts every shipped hook script exists on disk. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The integration_local harness installs @github/copilot globally via npm and expects a live Copilot CLI login on the machine. That is exactly the wrong default for contributors and CI — pytest -q on a clean box would try to mutate global npm state and fail on hosts without auth. - pytest.ini: add `-m "not integration_local"` to addopts. The marker is still registered, so `pytest -m integration_local` still works for the author running the local smoke deliberately. - docs/TEST_STRATEGY.md: document the marker row + the opt-in command. Confirmed: `pytest --collect-only -q` reports "384/385 tests collected (1 deselected)" on the current tree. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Audit flagged drift between plugin.json (2.1.0), marketplace.json (2.1.0), mcp/server.py (1.0.0), and scripts/omni.py (1.0.0), plus inconsistent skill/tool counts between README, AGENTS, CHANGELOG, INSTALL, QUICKSTART, and marketplace descriptions. Canonical numbers of record (verified from repo state): version: 2.1.1 skills: 27 (find skills -name SKILL.md | wc -l) agents: 19 (ls agents/*.md | wc -l) mcp tools: 30 (grep '^def _tool_' mcp/server.py | wc -l) Files updated to those numbers: - scripts/omni.py VERSION bumped 1.0.0 -> 2.1.1. - plugin.json, marketplace.json: 2.1.0 -> 2.1.1; marketplace.json description corrected to "27 skills, 19 agents, 30 MCP tools". - .github/copilot-instructions.md: "28 MCP tools" -> "30 MCP tools". - CHANGELOG.md entry in the v2.1.0 Copilot-CLI cleanup section: "28 MCP tools across ... ast-grep." -> "30 MCP tools across ..., codebase." (the repo gained the codebase family and the list fell behind.) - docs/INSTALL.md: corrects the stale "commands: 10, mcp tools: 20" doctor expectation — doctor never printed commands, and the tool count is 30, not 20. - docs/QUICKSTART.md: same correction, plus fixes my prior accidental "28 skills" edit back to the real 27. - docs/MIGRATION.md: migration-matrix row now lists 30 MCP tools next to the 19 agents / 27 skills pair. doctor now advertises the real MCP tool count via a small _count_mcp_tools helper (static scan of mcp/server.py for `def _tool_<name>(`). That is the signal QUICKSTART + INSTALL now promise. This closes the "doctor output vs docs" sub-part of the audit gap early. tests/test_cli.py: version assertion follows the bumped VERSION. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Audit §6 flagged README calling the codebase graph "a real repository knowledge graph" when the implementation is a stdlib adjacency walk over Python defs, import statements, and Markdown links. Audit §7 flagged LSP tools advertised alongside the full MCP surface while returning {"status": "stub"} on invocation. README.md: - "Inspect the repository knowledge graph" -> "Inspect the repository's lightweight code graph". Copy explains what the graph IS (stdlib walk, adjacency only), what it is NOT (persistent, semantic, multi-language), and what the deferred surface looks like. - Architecture diagram shows both hook scripts and points the banner line at the real "banner + memory context" behaviour. mcp/server.py: - lsp_hover / lsp_goto_definition / lsp_find_references descriptions now start with "EXPERIMENTAL / STUB" and explicitly state the stub return shape so any skill or agent can see the status without running the call. Tests unchanged (handlers already return the stub shape). CHANGELOG.md: - v2.1.1 Unreleased section gains a "Changed" note covering the memory + graph + LSP repositioning, the feature-count realignment, and the doctor MCP-tool-count addition. A "Removed" note lists the stale docs/ROUTER.md and docs/MODELS.md link cleanup. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…form mode regex (P1-8) Closes the audit gap that the old contract validator passed while the repo still had 18 files referencing missing router_state.py. The old regex only caught positional state_write("foo", ...) calls; the keyword form state_write(mode="foo") slipped through and hid every audit-era skill that wrote mode="deep-interview" outside the registered set. scripts/verify_plugin_contract.py: - _MODE_LITERAL_RE now catches state_write / state_read / state_clear / _mcp_write_best_effort in both positional and keyword forms. Covers both ' and " quotes. - _DYNAMIC_MODE_PREFIX now also treats angle-bracketed placeholders like `team.<slug>` as template strings (same as the `{var}` rule). - check_mode_key_registry matches concrete literals against registry placeholder patterns — so `team.worker-1` resolves to `team.<worker-slug>` and is considered registered without duplicating every concrete row. - Two new checks: broken-script-refs — scan skills/ and agents/ for scripts/<file>.py tokens; FAIL when the target is missing. missing-doc-links — scan README / AGENTS / docs index + quickstart + install + architecture for docs/*.md links; FAIL on dangling targets. Fragment anchors are stripped before resolution. Both new checks are wired into the CHECKS dict + argparse flags. scripts/omni.py: matching runtime checks added to `omni doctor` so a contributor sees missing scripts / broken doc links immediately, not only when they run the full contract validator. Helpers are pure-stdlib static scans over the file tree. docs/STATE_MODES.md: registers the `skill-active` mode (surfaced by cancel's stop-hook-silencing step) now that the broader regex can see it. Validator state after this commit: 21 checks, all green. Mode scan now catches keyword-form call sites that slipped through before. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Four-reviewer pass (security + architect + code-reviewer + codex adversarial) found a mix of HIGH issues that blocked merge and MEDIUM/NIT items. Must-fix items addressed here: mcp/server.py — atomic v6 migration + ordering - Wrap the v6 state-table rebuild (DROP INDEX, CREATE state_new, INSERT...SELECT, DROP TABLE state, ALTER RENAME, CREATE INDEX) in BEGIN/COMMIT so a crash between DROP TABLE and RENAME cannot lose the state rows. code-reviewer HIGH. - Update the "additive only" migration comment to document that v6 is an intentional one-time structural rebuild (NITPICK). - Explicit comment on the migration runner explaining how executescript + schema_version updates compose: step SQL runs first, the version bump only happens after the SQL succeeded. scripts/omni_team.py + scripts/subagent.py — schema bootstrap - Restore a pre-v6 fallback in _mcp_write_best_effort. Codex CRITICAL: without this, a user upgrading from a pre-v6 DB sees the best-effort writer fail silently until the MCP server happens to open the DB. The fallback tries ON CONFLICT(mode, session_id) first (v6 shape), then ON CONFLICT(mode) with session_id column (v2-v5), then the columnless legacy shape (v1). Write path stays idempotent. - Add explicit conn.commit() in subagent.py to match the omni_team.py pattern (security MEDIUM). scripts/omni.py — session-aware CLI - `omni state list` now prints a Session column and uses COALESCE(session_id,'') against the post-v6 schema, falling back on a "no session_id column" branch for forward compat. - `omni state show` accepts a new `--session-id` flag; unset = read the empty-session row, matching MCP tool back-compat. codex HIGH. - New _state_has_session_column helper so both commands work on arbitrary schema ages. skills/team/SKILL.md — contract alignment - Rename the misleading "Router preamble" section to "Resume preamble" and scrub the single remaining router-framed header (codex MEDIUM). - Fix four state_write / state_read examples that used the wrong argument shape (`active=true, state={...}` instead of the real `body={...}`) and omitted session_id. All examples now pass session_id="$OMNI_SESSION_ID" and wrap the payload in body={} to match the shipped MCP schema (codex HIGH). tests/test_mcp_state_session.py + docs/STATE_CONTRACT.md — back-compat - Extend test_legacy_modes_listing_shape_preserved to make the deliberate per-session exclusion + list=true discovery path explicit, so future edits cannot silently regress the back-compat promise (architect required). - Update the STATE_CONTRACT.md tool table to call out that the legacy no-mode aggregate deliberately excludes per-session rows and that list=true is the documented discovery path. Deferred as low-risk polish (listed in review but not blocking): - Regex double-substitution cleanup in verify_plugin_contract.py check_mode_key_registry (currently works correctly via two no-ops cancelling out — code-reviewer MEDIUM). - Concurrent-writes test body-content validation (code-reviewer MEDIUM; the current assertion validates COUNT(*) == 1 which is the serialization invariant under test). Verification: 383 pytest passed / 1 skipped; 21/21 validator checks green; doctor prints all 9 status lines OK; `omni state list` now renders the Session column cleanly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Codex final-pass review caught that my earlier comment claimed the schema_version bump and the v6 rebuild SQL run in the same transaction. They do not — executescript commits between statements, and the INSERT INTO schema_version is a separate autocommit statement right after. Fix the comment to describe what actually happens: the rebuild itself is atomic (wrapped in BEGIN/COMMIT), and a crash between the rebuild commit and the version bump simply re-runs the (idempotent) rebuild next startup. Pure comment change; no behaviour. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 55162d3e52

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-18T16:51:41Z

+        elif mode is not None:
            cur = conn.execute("DELETE FROM state WHERE mode=?", (mode,))


Scope mode-only state clears to default session

state_clear now supports per-session rows, but the mode-only branch deletes by mode without filtering session_id, so a legacy call like state_clear({"mode": "ralph"}) will remove every session’s row for that mode. This breaks session isolation in mixed clients (older callers + new session-scoped writers) and can silently wipe another active run’s state; mode-only should target the default session row (session_id='') unless the caller explicitly asks for broader deletion.

Useful? React with 👍 / 👎.

… P1) Follow-up to #25. Codex review flagged that `state_clear({"mode": "ralph"})` was deleting every session's row for that mode — a regression for back-compat because a legacy pre-v6 caller could accidentally wipe the state of an unrelated active session. Change: - Mode-only clear now targets the default empty-session row only (`session_id = ''`). - New opt-in combo: `state_clear({"mode": ..., "all": true})` purges every session's row for that mode. This is the deliberate broad-purge path; callers must ask for it explicitly. - Other combos unchanged: (mode + session_id) = one row, (session_id only) = all modes for that session, (all=true) = nuke all. Docs + schema description updated to match. Two new tests cover the back-compat scope (legacy call preserves per-session rows) and the opt-in broad purge. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Jurel89 and others added 11 commits April 18, 2026 18:05

chatgpt-codex-connector Bot reviewed Apr 18, 2026

View reviewed changes

Jurel89 merged commit ddaf11a into main Apr 18, 2026
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Contract reset: fix state/session, retire router, align docs (v2.1.1)#25

Contract reset: fix state/session, retire router, align docs (v2.1.1)#25
Jurel89 merged 11 commits intomainfrom
fix/contract-reset

Jurel89 commented Apr 18, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		elif mode is not None:
		cur = conn.execute("DELETE FROM state WHERE mode=?", (mode,))

Conversation

Jurel89 commented Apr 18, 2026

Summary

P0 — correctness + contract reset

P1 — product hardening

Review trail

Test plan

What is NOT in this PR (documented follow-ups)

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant