From 64fef85bd8336cd9f5d48c012ee0e3bf31ca151b Mon Sep 17 00:00:00 2001 From: Bingran You Date: Thu, 9 Apr 2026 10:23:59 -0700 Subject: [PATCH] docs: add reconstruction verification strategy --- reconstruction-guardrails/NODE.md | 1 + .../verification-and-acceptance-strategy.md | 155 ++++++++++++++++++ 2 files changed, 156 insertions(+) create mode 100644 reconstruction-guardrails/verification-and-acceptance-strategy.md diff --git a/reconstruction-guardrails/NODE.md b/reconstruction-guardrails/NODE.md index 8e58c9b..db35f2d 100644 --- a/reconstruction-guardrails/NODE.md +++ b/reconstruction-guardrails/NODE.md @@ -14,4 +14,5 @@ Use these leaves before adding any new knowledge: - **[rebuild-standard.md](rebuild-standard.md)** — The bar a node must clear to be useful for a clean-room implementation. - **[knowledge-lifecycle.md](knowledge-lifecycle.md)** — How extracted knowledge moves from observation to durable tree state, and how bad knowledge is detected. - **[rebuild-phasing.md](rebuild-phasing.md)** — Implementation order for a clean-room rebuild, including what must ship together. +- **[verification-and-acceptance-strategy.md](verification-and-acceptance-strategy.md)** — How to prove a rewrite actually matches Claude Code using tests, state round-trips, and direct CLI comparison. - **[tree-expansion-strategy.md](tree-expansion-strategy.md)** — How this repository should deepen from high-level coverage into reconstruction-ready subsystem contracts over repeated passes. diff --git a/reconstruction-guardrails/verification-and-acceptance-strategy.md b/reconstruction-guardrails/verification-and-acceptance-strategy.md new file mode 100644 index 0000000..b162f51 --- /dev/null +++ b/reconstruction-guardrails/verification-and-acceptance-strategy.md @@ -0,0 +1,155 @@ +--- +title: "Verification and Acceptance Strategy" +owners: [] +soft_links: [/reconstruction-guardrails/rebuild-standard.md, /reconstruction-guardrails/rebuild-phasing.md, /product-surface/command-surface.md, /platform-services/auth-login-logout-and-token-lifecycle.md, /integrations/mcp/config-layering-policy-and-dedup.md, /integrations/plugins/plugin-management-and-marketplace-flows.md] +--- + +# Verification and Acceptance Strategy + +This tree already explains what a faithful Claude Code rebuild must contain, but a clean-room rewrite still needs an explicit answer to a different question: **how do we prove the rebuild now behaves like Claude Code instead of merely resembling it on paper?** + +A reconstruction effort is not complete when every domain has prose coverage. It is complete only when the implementation clears a repeatable acceptance contract that combines documentation completeness, automated evidence, and direct behavior comparison against a runnable Claude Code client. + +## Scope boundary + +This leaf covers: + +- the acceptance bar for a rebuild that claims end-to-end Claude Code parity +- what evidence must exist before a milestone can be called complete +- the minimum automated and manual comparison matrix for CLI, print-mode, auth, MCP, plugin, session, and agent surfaces +- how to treat known divergence during an incremental reconstruction + +It intentionally does not re-document: + +- the domain-specific product contracts already captured in the rest of the tree +- implementation-specific test runner wiring for any one rewrite repository +- private fixture contents, recorded transcripts, or provider credentials + +## Completion rule + +Equivalent behavior should preserve this definition of done: + +- **not complete** means a feature exists in the source product shape but the rebuild is missing the surface entirely, cannot execute the flow end to end, or still requires unexplained manual intervention +- **conditionally complete** means the rebuild implements the flow and has evidence, but known divergence remains and is recorded explicitly with impact and follow-up scope +- **complete enough to claim parity** means the rebuild passes automated coverage for the surface, survives manual comparison against a real Claude Code client, and has no untracked user-visible differences for the accepted environment + +No milestone should be reported as "fully working like Claude Code" while a known divergence list still exists. + +## Evidence stack + +Every serious reconstruction milestone should carry evidence from all three layers: + +1. **tree evidence** — the relevant behavior contract exists in this Context Tree and is detailed enough that another engineer could rebuild the surface without privileged source access +2. **automated evidence** — unit, integration, scenario, and end-to-end tests cover the surface with deterministic pass/fail outcomes +3. **runtime comparison evidence** — the rebuild is exercised side by side with a runnable Claude Code CLI using the same machine, workspace shape, and provider posture when possible + +If any one of those layers is missing, the milestone is still provisional. + +## Acceptance matrix for a CLI rebuild + +Equivalent behavior should preserve an explicit acceptance matrix rather than relying on ad hoc spot checks. + +At minimum, the matrix should cover: + +- root entry behavior: default interactive startup, prompt-as-argument startup, `--help`, `--version`, `--resume`, `--continue`, and top-level command dispatch +- print mode: `-p/--print`, plain text output, JSON output, session-persistence toggles, schema validation, and prompt-plus-tail-option parsing +- provider and auth routing: API-key auth, Foundry or third-party provider auth, login status surfaces, logout teardown, and limited-scope token setup behavior +- session storage and restoration: session creation, latest-session reuse, naming, branching, transcript durability, and project identity matching +- MCP management and runtime effects: add, add-json, list, get, remove, project-choice reset, scope layering, and resulting live server visibility +- plugin management and runtime effects: marketplace listing, install, enable, disable, uninstall, update, validation, settings round-trip, and cache/index behavior +- agent surfaces: top-level `agents`, built-in agent visibility, agent selection defaults, and delegated worker flows +- update and install flows: launcher installation, version reporting, updater diagnostics, and safe behavior when a real Claude installation already exists +- doctor and diagnostics: health summaries, provider mode reporting, install-path reporting, and recovery guidance + +The exact command list can evolve, but the acceptance matrix must be concrete enough that two different people would run materially the same comparison. + +## Real-client comparison contract + +Equivalent behavior should preserve direct comparison against a runnable Claude Code binary whenever the question is about user-visible behavior. + +That comparison should favor: + +- the exact locally runnable `claude` binary or launcher the user actually uses +- absolute commands with explicit flags rather than paraphrased expectations +- the same workspace root and config posture for both clients when possible +- the same provider mode and model defaults when the flow depends on remote inference + +Useful comparison commands usually include: + +- `claude --help` +- `claude --version` +- `claude -p 'Reply with exactly OK' --output-format json --no-session-persistence` +- `claude auth status --text` +- `claude agents` +- `claude mcp --help` +- `claude mcp list` +- `claude plugin --help` +- `claude plugin list` +- `claude plugin marketplace list --json` +- `claude update` +- `claude install stable --force` + +The point is not that this exact list is sacred. The point is that each milestone should leave behind a concrete comparison bundle instead of a vague statement that the rewrite was "tested manually." + +## State-path round-trip contract + +A CLI rebuild is not faithful if it only matches stdout while silently drifting on-disk state. + +Equivalent behavior should preserve verification of the user-visible state layer, especially for: + +- user home state under `~/.claude/` +- global private state in `~/.claude.json` +- project-local private settings such as `.claude/settings.local.json` +- checked-in project settings such as `.claude/settings.json` or `.mcp.json` +- plugin marketplace metadata, plugin install indexes, and plugin cache paths +- MCP configuration persistence and later reload behavior + +For each mutable surface, verification should prove both: + +- the command prints the expected result +- the expected state file changes, and the same client can later read that change back correctly + +## Fixture and transcript policy + +Equivalent behavior should preserve a deterministic replay strategy for surfaces that would otherwise depend on provider variability. + +That means: + +- stable unit and integration tests should prefer fixtures or replay when remote inference is not the thing being verified +- real-provider parity tests should be reserved for cases where output shape or provider wiring is the subject of the test +- golden transcripts should capture command-level envelopes and state transitions, not private source code or secret-bearing payloads +- fixture refresh must be explicit and reviewable rather than happening silently in ordinary test runs + +## Divergence logging rule + +Equivalent behavior should preserve a written divergence log whenever the rebuild is still behind the source client. + +Each divergence entry should include: + +- the exact command or workflow that differs +- whether the difference is cosmetic, operational, or blocking +- whether the issue lives in docs coverage, command routing, runtime logic, provider wiring, persistence, or UI +- what evidence proved the mismatch +- what milestone is expected to remove it + +Unlogged divergence is reconstruction debt that will be rediscovered repeatedly. + +## Milestone acceptance bundle + +Before squashing a parity milestone into `main`, the bundle should include at least: + +- a green automated test run for the affected rewrite repository +- manual comparison notes for the changed command surfaces +- any updated Context Tree leaves required to explain the new behavior +- an explicit list of remaining parity gaps, or an explicit statement that no known user-visible gaps remain for the accepted environment + +If a milestone changes only internal structure and not user-visible behavior, the comparison bundle may be smaller, but the author should still say why no broader parity rerun was needed. + +## Failure modes + +- **paper parity**: the tree says a surface exists, but there is no automated or runtime evidence that the rebuild actually behaves that way +- **stdout-only verification**: command text matches, but config files, session state, plugin indexes, or MCP scopes drift from source behavior +- **partial rerun blindness**: a new milestone re-tests only the changed command and misses regressions in adjacent root-entry or persistence behavior +- **provider ambiguity**: parity claims are made without saying which auth/provider posture was used, so others cannot reproduce the result +- **hidden divergence**: known mismatches are left out of milestone notes, making later agents assume the rebuild is farther along than it is +- **non-deterministic acceptance**: tests depend on live inference or mutable remote state without replay or controlled comparison, so parity status changes between runs