From be8c40587645c20c4567f51d71321d8411ba3220 Mon Sep 17 00:00:00 2001 From: Bingran You Date: Tue, 14 Apr 2026 14:56:22 -0700 Subject: [PATCH] Capture the shared preload contract behind shard-safe testing The tree now records the shared test-preload and shard-isolation contract that ties reset hooks, cache clearing, and platform-sensitive performance guards into one native-test-derived verification asset. Constraint: The source snapshot exposes preload behavior only through cross-file anchors, not a full visible test/preload.ts body Rejected: Leave preload behavior implied by the generic seam doc | would hide the same-process shard contract that the source clearly treats as important Confidence: high Scope-risk: narrow Reversibility: clean Directive: Future verification assets that depend on same-process testing should state whether they assume preload-based isolation or a full process restart Tested: npx -p first-tree first-tree verify Not-tested: No source-repo runtime changes --- .../NODE.md | 1 + ...shared-test-preload-and-shard-isolation.md | 113 ++++++++++++++++++ 2 files changed, 114 insertions(+) create mode 100644 reconstruction-guardrails/verification-and-native-test-oracles/shared-test-preload-and-shard-isolation.md diff --git a/reconstruction-guardrails/verification-and-native-test-oracles/NODE.md b/reconstruction-guardrails/verification-and-native-test-oracles/NODE.md index e439bcf..74a9148 100644 --- a/reconstruction-guardrails/verification-and-native-test-oracles/NODE.md +++ b/reconstruction-guardrails/verification-and-native-test-oracles/NODE.md @@ -15,6 +15,7 @@ This subdomain captures cross-cutting knowledge about how the observed Claude Co Relevant leaves: +- **[shared-test-preload-and-shard-isolation.md](shared-test-preload-and-shard-isolation.md)** — How the shared preload layer, reset hooks, and shard-sensitive performance guards keep same-process tests isolated without turning the framework into one-process-per-case only. - **[minimal-end-to-end-verification-chain.md](minimal-end-to-end-verification-chain.md)** — The shortest serious proof ladder a rewrite should clear before broader parity claims are considered credible. - **[parity-capability-matrix.md](parity-capability-matrix.md)** — Which capability families are blocking for parity, which are extension-level, and what evidence bar each family must clear before a rebuild can claim success. - **[reconstruction-target-and-evidence-boundary.md](reconstruction-target-and-evidence-boundary.md)** — How source-snapshot evidence and later released-binary evidence can both inform the tree without collapsing into one false versionless parity claim. diff --git a/reconstruction-guardrails/verification-and-native-test-oracles/shared-test-preload-and-shard-isolation.md b/reconstruction-guardrails/verification-and-native-test-oracles/shared-test-preload-and-shard-isolation.md new file mode 100644 index 0000000..e467249 --- /dev/null +++ b/reconstruction-guardrails/verification-and-native-test-oracles/shared-test-preload-and-shard-isolation.md @@ -0,0 +1,113 @@ +--- +title: "Shared Test Preload and Shard Isolation" +owners: [bingran-you] +soft_links: + - /reconstruction-guardrails/verification-and-native-test-oracles/test-framework-overview.md + - /reconstruction-guardrails/verification-and-native-test-oracles/test-seams-reset-hooks-and-injected-dependencies.md + - /reconstruction-guardrails/verification-and-native-test-oracles/test-runtime-mode-and-determinism.md + - /reconstruction-guardrails/verification-and-native-test-oracles/evidence-levels-and-missing-artifacts.md + - /platform-services/settings-change-detection-and-runtime-reload.md + - /tools-and-permissions/permissions/e2e-permission-testing-contracts.md +native_source: test/preload.ts +verification_status: native_test_derived +--- + +# Shared Test Preload and Shard Isolation + +The current Claude Code snapshot does not behave like every test gets a fresh process. Multiple comments and exported reset hooks show a different contract: a shared preload layer resets sticky runtime state between same-process tests and across shards, so the suite can stay fast without quietly letting one case poison the next. + +## Scope boundary + +This leaf covers: + +- the role of the shared `test/preload.ts` layer +- what kinds of process-local state it must neutralize between tests +- how shard-sensitive and Windows-sensitive failure modes shape the testing contract + +It intentionally does not re-document: + +- every resettable seam family already summarized in [test-seams-reset-hooks-and-injected-dependencies.md](test-seams-reset-hooks-and-injected-dependencies.md) +- the full hidden contents of `test/preload.ts`, which the current snapshot still does not expose directly +- runner manifests or CI workflow wiring beyond what the visible source anchors prove + +## One shared preload layer is part of the framework contract + +Equivalent behavior should preserve: + +- one shared preload or before-each reset layer for same-process test execution +- that preload clearing sticky state through product-owned reset hooks instead of relying only on whole-process restarts +- shard isolation being treated as a first-class requirement, not a lucky side effect + +The important product signal is that Claude Code expects multiple tests in one process to be normal, and therefore invests in explicit reset machinery. + +## The preload layer must clear product caches, not just mocks + +Visible source anchors show the preload contract reaching real product caches and registries, including: + +- bootstrap or app-wide state that exposes dedicated test-only reset entrypoints +- plugin command, agent, hook, output-style, and prompt caches +- registered hook state that would otherwise survive into later cases +- memoized path- or working-directory-resolution helpers that are exported specifically for shard-isolation cache clearing +- sticky attachment or skill-sending state that would otherwise make later cases depend on earlier history + +Equivalent behavior should preserve a preload that clears product reality, not just one mocking framework's local spies. + +## Reset hooks must stay test-gated + +Equivalent behavior should preserve: + +- reset hooks being callable only in test posture when they would be unsafe or misleading in production +- clear separation between "public runtime API" and "test-only reset path" +- explicit naming that signals testing intent when a helper exists only to repair process-local state between cases + +This matters because the observed source treats reset hooks as framework tools, not as public recovery commands. + +## Plugin and hook isolation has special rules + +The visible source does not treat plugin-hook reset as a naive wipe-everything path. + +Equivalent behavior should preserve: + +- cache invalidation staying distinct from the live registered-hook set when immediate hook loss would change runtime behavior incorrectly +- prune-style cleanup for no-longer-enabled plugin hooks staying possible without prematurely erasing still-valid hooks +- the shared preload starting from a truly empty or reset hook state before later test-specific plugin loading occurs + +The load-bearing rule is that test isolation must not accidentally change the production semantics the test is trying to verify. + +## Shard-sensitive heavy modules need defensive handling + +Visible source comments show that shard isolation is not only logical state cleanup. It also affects performance and timeout behavior. + +Equivalent behavior should preserve: + +- lazy loading of heavy modules when eager module evaluation would bloat the heap for every later test in the shard +- test-aware tuning or env overrides for platform-sensitive slow paths, especially Windows CI cases where repeated spawns or large lazy modules can push a shard into timeout territory +- platform-specific flakes being treated as framework issues, not only as one test's local problem + +The important point is not one exact timeout value. It is that same-shard performance pressure is part of the observed test architecture. + +## Windows and same-shard failures are part of the acceptance oracle + +The visible source specifically anchors failures such as: + +- later tests in the same Windows shard timing out after a heavy module was imported too early +- repeated PowerShell parse spawns on Windows CI exceeding the interactive-default timeout unless tests can override that limit + +Equivalent behavior should preserve the idea that shard-local performance regressions are real correctness failures for the framework, not just CI noise to ignore. + +## Relationship to higher-level seams + +This leaf is narrower than the general seam docs: + +- [test-seams-reset-hooks-and-injected-dependencies.md](test-seams-reset-hooks-and-injected-dependencies.md) explains why reset hooks and narrow seams exist at all +- this leaf explains the extra framework contract that one shared preload layer coordinates those resets to make same-process testing and sharding trustworthy + +Both are needed. Without the seam doc, the preload feels incidental. Without this leaf, the seam doc does not explain how the framework actually keeps tests isolated at scale. + +## Failure modes + +- **same-process bleed**: one test leaves hooks, caches, settings overlays, or sent-skill markers behind and later tests inherit them +- **naive reset regression**: isolation wipes live hook state in a way that changes the behavior a test was meant to observe +- **path-cache contamination**: memoized working-directory or path-resolution helpers survive across tests and make permission or filesystem checks order-dependent +- **shard timeout spiral**: heavy modules load eagerly or slow-path defaults stay fixed, so later tests in the same shard start failing only under CI load +- **test-gate leak**: reset helpers intended only for `NODE_ENV=test` become callable from ordinary runtime paths