From 490b222c9b5953f3558146c1f8d50d33ee97b3df Mon Sep 17 00:00:00 2001
From: Bingran You <bingran.you@berkeley.edu>
Date: Tue, 14 Apr 2026 14:38:51 -0700
Subject: [PATCH] Clarify which Claude Code line a parity claim targets

The tree now separates leaked-source snapshot evidence from later
released-binary observations so parity work does not silently blend
version-sensitive behavior from incompatible build lines.

Constraint: Source snapshot and released binaries describe different evidence lines
Rejected: Treat released CLI behavior as the implicit default target | would mix public drift with snapshot-only internal contracts
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Future parity assets should name their target line and evidence family before claiming Claude Code equivalence
Tested: npx -p first-tree first-tree verify
Not-tested: No source-repo runtime changes
---
 .../NODE.md                                   |   1 +
 ...nstruction-target-and-evidence-boundary.md | 115 ++++++++++++++++++
 2 files changed, 116 insertions(+)
 create mode 100644 reconstruction-guardrails/verification-and-native-test-oracles/reconstruction-target-and-evidence-boundary.md

diff --git a/reconstruction-guardrails/verification-and-native-test-oracles/NODE.md b/reconstruction-guardrails/verification-and-native-test-oracles/NODE.md
index a83711c..5edb184 100644
--- a/reconstruction-guardrails/verification-and-native-test-oracles/NODE.md
+++ b/reconstruction-guardrails/verification-and-native-test-oracles/NODE.md
@@ -15,6 +15,7 @@ This subdomain captures cross-cutting knowledge about how the observed Claude Co
 
 Relevant leaves:
 
+- **[reconstruction-target-and-evidence-boundary.md](reconstruction-target-and-evidence-boundary.md)** — How source-snapshot evidence and later released-binary evidence can both inform the tree without collapsing into one false versionless parity claim.
 - **[test-framework-overview.md](test-framework-overview.md)** — The layered shape of the current test system, including the visible tier model and the boundary between confirmed and inferred runner details.
 - **[real-cli-e2e-scenario-corpus.md](real-cli-e2e-scenario-corpus.md)** — A live-observed black-box scenario set for validating whether a rebuild behaves like a real Claude Code CLI across startup, headless runs, session continuity, structured I/O, and diagnostics.
 - **[test-runtime-mode-and-determinism.md](test-runtime-mode-and-determinism.md)** — How `NODE_ENV=test` behaves as a supported runtime posture, including in-memory config behavior, reduced side effects, and deterministic test-only branches.
diff --git a/reconstruction-guardrails/verification-and-native-test-oracles/reconstruction-target-and-evidence-boundary.md b/reconstruction-guardrails/verification-and-native-test-oracles/reconstruction-target-and-evidence-boundary.md
new file mode 100644
index 0000000..8cac38c
--- /dev/null
+++ b/reconstruction-guardrails/verification-and-native-test-oracles/reconstruction-target-and-evidence-boundary.md
@@ -0,0 +1,115 @@
+---
+title: "Reconstruction Target and Evidence Boundary"
+owners: [bingran-you]
+soft_links:
+  - /reconstruction-guardrails/verification-and-acceptance-strategy.md
+  - /reconstruction-guardrails/verification-and-native-test-oracles/evidence-levels-and-missing-artifacts.md
+  - /reconstruction-guardrails/verification-and-native-test-oracles/real-cli-e2e-scenario-corpus.md
+  - /reconstruction-guardrails/verification-and-native-test-oracles/released-cli-e2e-test-set.md
+---
+
+# Reconstruction Target and Evidence Boundary
+
+This tree currently draws on two different evidence families:
+
+- a partial local source snapshot captured from the March 31, 2026 leak window
+- black-box runs against released Claude Code binaries observed later on local machines
+
+Both evidence families are useful. They are not interchangeable. A faithful rebuild needs an explicit rule for how to use them without accidentally mixing incompatible version-sensitive details into one false "100% parity" claim.
+
+## Scope boundary
+
+This leaf covers:
+
+- how to choose the reconstruction target before claiming parity
+- which evidence family is authoritative for which kind of question
+- what must be stated when a rebuild claims Claude Code equivalence
+
+It intentionally does not re-document:
+
+- the source-packaging limitations already captured in [evidence-levels-and-missing-artifacts.md](evidence-levels-and-missing-artifacts.md)
+- the scenario corpus itself, already captured in [real-cli-e2e-scenario-corpus.md](real-cli-e2e-scenario-corpus.md)
+- the current released-binary observations already captured in [released-cli-e2e-test-set.md](released-cli-e2e-test-set.md)
+
+## One parity claim needs one explicit target
+
+Equivalent behavior should preserve one declared target at a time.
+
+That target may be:
+
+- the leaked-source snapshot line, when the goal is to reconstruct the internal product shape visible in that snapshot
+- a specific released CLI line, when the goal is to match what end users actually saw from a shipped binary
+- a deliberately versioned hybrid milestone, but only if the milestone says exactly which behaviors come from which source of truth
+
+What must not happen is an unqualified "Claude Code parity" claim that silently mixes source-snapshot internals with later released-binary behavior when those two lines visibly drift.
+
+## Source snapshot answers shape questions
+
+The March 31, 2026 snapshot is authoritative for questions such as:
+
+- which subsystems exist
+- which contracts, seams, fixtures, and state machines are visible in code
+- which feature-gated surfaces or hidden verification hooks are proven by source evidence
+- which behaviors were important enough upstream to defend with tests or dedicated helper seams
+
+The snapshot is not authoritative for:
+
+- the exact behavior of later released binaries after subsequent updates
+- the full hidden runner manifest or CI layout when those artifacts were not included in the snapshot packaging
+- exact claims about public surfaces that visibly changed after the leak window
+
+## Released binaries answer public-behavior questions
+
+Released CLI observations are authoritative for questions such as:
+
+- what a real user-facing command or flag did on a real machine
+- how startup, onboarding, doctor, install, update, print mode, resume, plugin, and MCP flows behaved from outside the binary
+- which behaviors are version-sensitive enough that a rebuild must avoid overfitting to the leaked snapshot alone
+
+Released-binary evidence is not authoritative for:
+
+- hidden subsystem boundaries that are only visible in source
+- internal feature-gated flows that were compiled out or otherwise unreachable from the tested build
+- source-era contracts that the public build no longer exposed directly
+
+## Public parity must name the target and environment
+
+Any serious parity claim should explicitly name:
+
+- the target line being matched
+- the evidence family used for that claim
+- the environment posture when that matters, such as provider mode, auth shape, OS, or interactive versus headless surface
+
+For example, a useful claim looks like:
+
+- "Matches the March 31, 2026 source snapshot for test posture, fixture policy, and visible tool/runtime contracts."
+- "Matches the April 9, 2026 observed native CLI line for provider-backed local print-mode, startup, and doctor behavior on macOS."
+
+A weak claim is simply "matches Claude Code" with no target, no date, and no runtime posture.
+
+## Version-sensitive observations must stay labeled
+
+Equivalent behavior should preserve explicit labeling for observations that are known to drift between builds, including:
+
+- install and update reporting
+- `--continue` and `--no-session-persistence` behavior
+- plugin update restart semantics
+- onboarding and trust-flow details that changed across released binaries
+
+Those observations are valuable, but they must stay attached to the version line that produced them. Otherwise the tree invites rebuilds to combine incompatible facts into one impossible target.
+
+## Decision rule for future tree work
+
+When adding a new verification or parity asset:
+
+- use source evidence when the question is about hidden shape, subsystem design, or native test intent
+- use released-binary evidence when the question is about externally visible runtime behavior
+- if both evidence families are needed, state the split directly instead of flattening them into one undifferentiated conclusion
+- if the two families disagree, record that disagreement as version drift unless there is direct evidence they still refer to the same build line
+
+## Failure modes
+
+- **version blur**: one leaf silently mixes source-era and later released-binary behavior as if both described one immutable product
+- **false 100% claim**: a rewrite is declared complete without naming the target line or evidence family
+- **public-over-internal inversion**: a later binary observation is used to override a source-proven hidden contract with no evidence that the underlying design changed
+- **snapshot-over-public inversion**: a leaked-source detail is treated as the final word on a user-visible command even after later released binaries show drift