From 490b222c9b5953f3558146c1f8d50d33ee97b3df Mon Sep 17 00:00:00 2001 From: Bingran You Date: Tue, 14 Apr 2026 14:38:51 -0700 Subject: [PATCH] Clarify which Claude Code line a parity claim targets The tree now separates leaked-source snapshot evidence from later released-binary observations so parity work does not silently blend version-sensitive behavior from incompatible build lines. Constraint: Source snapshot and released binaries describe different evidence lines Rejected: Treat released CLI behavior as the implicit default target | would mix public drift with snapshot-only internal contracts Confidence: high Scope-risk: narrow Reversibility: clean Directive: Future parity assets should name their target line and evidence family before claiming Claude Code equivalence Tested: npx -p first-tree first-tree verify Not-tested: No source-repo runtime changes --- .../NODE.md | 1 + ...nstruction-target-and-evidence-boundary.md | 115 ++++++++++++++++++ 2 files changed, 116 insertions(+) create mode 100644 reconstruction-guardrails/verification-and-native-test-oracles/reconstruction-target-and-evidence-boundary.md diff --git a/reconstruction-guardrails/verification-and-native-test-oracles/NODE.md b/reconstruction-guardrails/verification-and-native-test-oracles/NODE.md index a83711c..5edb184 100644 --- a/reconstruction-guardrails/verification-and-native-test-oracles/NODE.md +++ b/reconstruction-guardrails/verification-and-native-test-oracles/NODE.md @@ -15,6 +15,7 @@ This subdomain captures cross-cutting knowledge about how the observed Claude Co Relevant leaves: +- **[reconstruction-target-and-evidence-boundary.md](reconstruction-target-and-evidence-boundary.md)** — How source-snapshot evidence and later released-binary evidence can both inform the tree without collapsing into one false versionless parity claim. - **[test-framework-overview.md](test-framework-overview.md)** — The layered shape of the current test system, including the visible tier model and the boundary between confirmed and inferred runner details. - **[real-cli-e2e-scenario-corpus.md](real-cli-e2e-scenario-corpus.md)** — A live-observed black-box scenario set for validating whether a rebuild behaves like a real Claude Code CLI across startup, headless runs, session continuity, structured I/O, and diagnostics. - **[test-runtime-mode-and-determinism.md](test-runtime-mode-and-determinism.md)** — How `NODE_ENV=test` behaves as a supported runtime posture, including in-memory config behavior, reduced side effects, and deterministic test-only branches. diff --git a/reconstruction-guardrails/verification-and-native-test-oracles/reconstruction-target-and-evidence-boundary.md b/reconstruction-guardrails/verification-and-native-test-oracles/reconstruction-target-and-evidence-boundary.md new file mode 100644 index 0000000..8cac38c --- /dev/null +++ b/reconstruction-guardrails/verification-and-native-test-oracles/reconstruction-target-and-evidence-boundary.md @@ -0,0 +1,115 @@ +--- +title: "Reconstruction Target and Evidence Boundary" +owners: [bingran-you] +soft_links: + - /reconstruction-guardrails/verification-and-acceptance-strategy.md + - /reconstruction-guardrails/verification-and-native-test-oracles/evidence-levels-and-missing-artifacts.md + - /reconstruction-guardrails/verification-and-native-test-oracles/real-cli-e2e-scenario-corpus.md + - /reconstruction-guardrails/verification-and-native-test-oracles/released-cli-e2e-test-set.md +--- + +# Reconstruction Target and Evidence Boundary + +This tree currently draws on two different evidence families: + +- a partial local source snapshot captured from the March 31, 2026 leak window +- black-box runs against released Claude Code binaries observed later on local machines + +Both evidence families are useful. They are not interchangeable. A faithful rebuild needs an explicit rule for how to use them without accidentally mixing incompatible version-sensitive details into one false "100% parity" claim. + +## Scope boundary + +This leaf covers: + +- how to choose the reconstruction target before claiming parity +- which evidence family is authoritative for which kind of question +- what must be stated when a rebuild claims Claude Code equivalence + +It intentionally does not re-document: + +- the source-packaging limitations already captured in [evidence-levels-and-missing-artifacts.md](evidence-levels-and-missing-artifacts.md) +- the scenario corpus itself, already captured in [real-cli-e2e-scenario-corpus.md](real-cli-e2e-scenario-corpus.md) +- the current released-binary observations already captured in [released-cli-e2e-test-set.md](released-cli-e2e-test-set.md) + +## One parity claim needs one explicit target + +Equivalent behavior should preserve one declared target at a time. + +That target may be: + +- the leaked-source snapshot line, when the goal is to reconstruct the internal product shape visible in that snapshot +- a specific released CLI line, when the goal is to match what end users actually saw from a shipped binary +- a deliberately versioned hybrid milestone, but only if the milestone says exactly which behaviors come from which source of truth + +What must not happen is an unqualified "Claude Code parity" claim that silently mixes source-snapshot internals with later released-binary behavior when those two lines visibly drift. + +## Source snapshot answers shape questions + +The March 31, 2026 snapshot is authoritative for questions such as: + +- which subsystems exist +- which contracts, seams, fixtures, and state machines are visible in code +- which feature-gated surfaces or hidden verification hooks are proven by source evidence +- which behaviors were important enough upstream to defend with tests or dedicated helper seams + +The snapshot is not authoritative for: + +- the exact behavior of later released binaries after subsequent updates +- the full hidden runner manifest or CI layout when those artifacts were not included in the snapshot packaging +- exact claims about public surfaces that visibly changed after the leak window + +## Released binaries answer public-behavior questions + +Released CLI observations are authoritative for questions such as: + +- what a real user-facing command or flag did on a real machine +- how startup, onboarding, doctor, install, update, print mode, resume, plugin, and MCP flows behaved from outside the binary +- which behaviors are version-sensitive enough that a rebuild must avoid overfitting to the leaked snapshot alone + +Released-binary evidence is not authoritative for: + +- hidden subsystem boundaries that are only visible in source +- internal feature-gated flows that were compiled out or otherwise unreachable from the tested build +- source-era contracts that the public build no longer exposed directly + +## Public parity must name the target and environment + +Any serious parity claim should explicitly name: + +- the target line being matched +- the evidence family used for that claim +- the environment posture when that matters, such as provider mode, auth shape, OS, or interactive versus headless surface + +For example, a useful claim looks like: + +- "Matches the March 31, 2026 source snapshot for test posture, fixture policy, and visible tool/runtime contracts." +- "Matches the April 9, 2026 observed native CLI line for provider-backed local print-mode, startup, and doctor behavior on macOS." + +A weak claim is simply "matches Claude Code" with no target, no date, and no runtime posture. + +## Version-sensitive observations must stay labeled + +Equivalent behavior should preserve explicit labeling for observations that are known to drift between builds, including: + +- install and update reporting +- `--continue` and `--no-session-persistence` behavior +- plugin update restart semantics +- onboarding and trust-flow details that changed across released binaries + +Those observations are valuable, but they must stay attached to the version line that produced them. Otherwise the tree invites rebuilds to combine incompatible facts into one impossible target. + +## Decision rule for future tree work + +When adding a new verification or parity asset: + +- use source evidence when the question is about hidden shape, subsystem design, or native test intent +- use released-binary evidence when the question is about externally visible runtime behavior +- if both evidence families are needed, state the split directly instead of flattening them into one undifferentiated conclusion +- if the two families disagree, record that disagreement as version drift unless there is direct evidence they still refer to the same build line + +## Failure modes + +- **version blur**: one leaf silently mixes source-era and later released-binary behavior as if both described one immutable product +- **false 100% claim**: a rewrite is declared complete without naming the target line or evidence family +- **public-over-internal inversion**: a later binary observation is used to override a source-proven hidden contract with no evidence that the underlying design changed +- **snapshot-over-public inversion**: a leaked-source detail is treated as the final word on a user-visible command even after later released binaries show drift