agent-team-foundation · bingran-you · Apr 9, 2026 · Apr 9, 2026
@@ -1,7 +1,7 @@
 ---
 title: "SSH Remote Session and Auth Proxy"
 owners: []
-soft_links: [/integrations/clients/direct-connect-session-bootstrap-and-environment-selection.md, /integrations/clients/remote-session-message-adaptation-and-viewer-state.md, /product-surface/startup-entrypoint-routing-and-session-handoff.md, /platform-services/provider-specific-api-clients-and-auth-routing.md, /reconstruction-guardrails/verification-and-native-test-oracles/test-seams-reset-hooks-and-injected-dependencies.md]
+soft_links: [/integrations/clients/direct-connect-session-bootstrap-and-environment-selection.md, /integrations/clients/remote-session-message-adaptation-and-viewer-state.md, /product-surface/startup-entrypoint-routing-and-session-handoff.md, /platform-services/provider-specific-api-clients-and-auth-routing.md, /reconstruction-guardrails/verification-and-native-test-oracles/test-seams-reset-hooks-and-injected-dependencies.md, /reconstruction-guardrails/verification-and-native-test-oracles/e2e-harness-reality-boundaries.md]
 ---
 
 # SSH Remote Session and Auth Proxy

@@ -6,6 +6,7 @@ soft_links:
   - /integrations/mcp/server-contract.md
   - /platform-services/auth-config-and-policy.md
   - /reconstruction-guardrails/verification-and-native-test-oracles/test-seams-reset-hooks-and-injected-dependencies.md
+  - /reconstruction-guardrails/verification-and-native-test-oracles/e2e-harness-reality-boundaries.md
 ---
 
 # Federated Auth Conformance and IdP Test Seeding

@@ -16,7 +16,10 @@ This subdomain captures cross-cutting knowledge about how the observed Claude Co
 Relevant leaves:
 
 - **[test-framework-overview.md](test-framework-overview.md)** — The layered shape of the current test system, including the visible tier model and the boundary between confirmed and inferred runner details.
+- **[test-runtime-mode-and-determinism.md](test-runtime-mode-and-determinism.md)** — How `NODE_ENV=test` behaves as a supported runtime posture, including in-memory config behavior, reduced side effects, and deterministic test-only branches.
 - **[test-environment-fixtures-and-ci-fail-closed-policy.md](test-environment-fixtures-and-ci-fail-closed-policy.md)** — How test posture suppresses side effects, how fixture replay works, and why missing recordings fail closed in CI.
+- **[test-lane-coverage-map.md](test-lane-coverage-map.md)** — Which subsystem contracts are guarded by fast regression, integration, end-to-end, conformance, and compatibility lanes, without overclaiming the hidden runner layout.
+- **[e2e-harness-reality-boundaries.md](e2e-harness-reality-boundaries.md)** — Which end-to-end harnesses may shorten setup but still need to preserve real permission, transport, auth-proxy, and credential-cache paths.
 - **[test-seams-reset-hooks-and-injected-dependencies.md](test-seams-reset-hooks-and-injected-dependencies.md)** — The narrow seams the product uses to keep hard behaviors testable without turning the whole runtime into a debug harness.
 - **[native-test-derived-asset-provenance-and-acceptance-rules.md](native-test-derived-asset-provenance-and-acceptance-rules.md)** — How native test knowledge should be normalized into clean-room contract assets and how those assets should be linked back to their owning domains.
 - **[evidence-levels-and-missing-artifacts.md](evidence-levels-and-missing-artifacts.md)** — What this source snapshot proves, what it only strongly suggests, and which missing artifacts still block exact runner-level reproduction.
@@ -0,0 +1,61 @@
+---
+title: "E2E Harness Reality Boundaries"
+owners: [bingran-you]
+soft_links:
+  - /reconstruction-guardrails/verification-and-native-test-oracles/test-lane-coverage-map.md
+  - /reconstruction-guardrails/verification-and-native-test-oracles/test-seams-reset-hooks-and-injected-dependencies.md
+  - /tools-and-permissions/permissions/e2e-permission-testing-contracts.md
+  - /ui-and-experience/dialogs-and-approvals/permission-prompt-shell-and-worker-states.md
+  - /integrations/clients/ssh-remote-session-and-auth-proxy.md
+  - /integrations/mcp/federated-auth-conformance-and-idp-test-seeding.md
+  - /runtime-orchestration/sessions/session-artifacts-and-sharing.md
+---
+
+# E2E Harness Reality Boundaries
+
+The observed Claude Code snapshot uses several harnesses that shorten setup cost without abandoning the real runtime path. That distinction matters: a clean-room rebuild should preserve which parts of end-to-end verification are allowed to be synthetic and which parts still need to exercise production-like orchestration.
+
+## Approval harnesses must still use the real permission path
+
+Equivalent behavior should preserve:
+
+- a narrow approval-oriented harness that can force the permission flow to appear on demand
+- that harness still entering through the normal tool catalog, permission-decision pipeline, and permission-prompt shell
+- grant, deny, cancel, queue-advance, and worker-forwarding behavior being validated through the same UI and callback machinery users actually see
+
+An e2e approval test that injects dialog state directly is no longer testing the product contract that matters.
+
+## Remote transport harnesses may skip deployment, not orchestration
+
+Equivalent behavior should preserve:
+
+- a local harness mode that can avoid real SSH deployment when the test only needs to verify split local-UI and remote-execution plumbing
+- the auth proxy, transcript adaptation, permission relay, and session-lifecycle machinery still being exercised
+- failures and reconnect behavior still traveling through the real remote-session contract rather than a fake one-shot shell wrapper
+
+The shortcut is allowed to reduce environment setup. It is not allowed to erase the transport boundary being tested.
+
+## Federated-auth harnesses may skip browser setup, not credential semantics
+
+Equivalent behavior should preserve:
+
+- a deterministic way to seed federated credentials when a mock identity provider does not expose the full interactive browser surface
+- seeded credentials landing in the same secure cache slot the ordinary login and refresh paths later read
+- downstream exchange, refresh, and revocation behavior still using the normal federated auth path
+
+Otherwise the test stops proving interoperability and starts proving only that a bypass slot was written successfully.
+
+## Session-state harnesses may seed artifacts, not invent a separate resume model
+
+Equivalent behavior should preserve:
+
+- targeted setters or seed helpers for session artifact state when bootstrapping a full prior session would be too expensive
+- those helpers still feeding the same transcript, artifact, and resume semantics production uses
+- test convenience never becoming a second, incompatible persistence model
+
+## Failure modes
+
+- **fake dialog coverage**: permission tests manipulate UI state directly and stop covering the real approval pipeline
+- **transport collapse**: a local remote-session harness stops exercising proxying, relay, or transcript adaptation
+- **credential bypass**: federated auth tests seed a token into a cache path the real login flow never reads
+- **shadow persistence**: test setters create a second resume model unrelated to the live session artifact system
@@ -5,6 +5,7 @@ soft_links:
   - /reconstruction-guardrails/source-boundary.md
   - /reconstruction-guardrails/knowledge-lifecycle.md
   - /reconstruction-guardrails/verification-and-native-test-oracles/test-framework-overview.md
+  - /reconstruction-guardrails/verification-and-native-test-oracles/test-lane-coverage-map.md
 ---
 
 # Evidence Levels and Missing Artifacts
@@ -15,16 +16,19 @@ This repository should distinguish between what the current source snapshot prov
 
 The snapshot is sufficient to confirm all of these:
 
-- there are distinct unit or regression, integration, end-to-end, conformance, and compatibility lanes
-- `NODE_ENV=test` is a real runtime posture
+- `NODE_ENV=test` is a supported runtime posture rather than a one-off conditional
 - fixture and VCR replay are first-class testing mechanisms
+- there are direct signals for multiple lane families, including at least one compatibility lane, at least one integration lane, dedicated end-to-end harnesses, conformance-sensitive auth verification, and many narrow regression or fidelity oracles
 - narrow seams such as injected dependencies, exported testing helpers, resets, and test-only helper surfaces are part of the current design
 
+The tree should treat those as lane-family and architecture facts, not as proof of the full hidden runner inventory.
+
 ## Strongly suggested but not fully proven
 
 The tree can safely treat these as strong signals rather than as closed facts:
 
 - the TypeScript runner environment is Bun-oriented in at least part of the stack
+- the regression or unit layer is broader than the few directly named test references exposed in comments and helper exports
 - repo-level scripts wrap at least some runner commands instead of every lane being invoked directly
 
 ## Still missing for exact runner-level reproduction
@@ -33,6 +37,7 @@ The current snapshot does not fully expose:
 
 - the top-level repository manifest and script table
 - the complete test directory layout
+- the exhaustive lane inventory and lane-to-command matrix
 - the full committed fixture corpus
 - the CI workflow and any sharding or coverage rules
 
@@ -44,6 +49,7 @@ While those artifacts are missing, the tree should:
 
 - document the confirmed architecture and tier model
 - preserve clear evidence labels for inferred versus confirmed details
+- claim lane purpose and behavior ownership more confidently than lane naming or runner wiring
 - refuse to guess exact runner wiring that the snapshot did not show
 
 This is a knowledge-quality rule, not a refusal to make progress. The visible framework is already rich enough to guide a clean-room rebuild of the verification architecture itself.
@@ -2,6 +2,7 @@
 title: "Test Environment, Fixtures, and CI Fail-Closed Policy"
 owners: [bingran-you]
 soft_links:
+  - /reconstruction-guardrails/verification-and-native-test-oracles/test-runtime-mode-and-determinism.md
   - /platform-services/startup-service-sequencing-and-capability-gates.md
   - /platform-services/usage-analytics-and-migrations.md
   - /integrations/clients/structured-io-and-headless-session-loop.md
@@ -25,9 +26,15 @@ The important point is not one specific branch. It is that the runtime treats te
 
 ## Fixture replay is a first-class oracle
 
-The snapshot exposes a VCR-style replay layer for API-dependent behavior.
+The snapshot exposes more than one fixture family for API-adjacent behavior.
 
-That layer preserves:
+Equivalent behavior should preserve:
+
+- a generic fixture helper for deterministic caching of arbitrary expensive or externalized test oracles
+- message-replay fixtures for API response and streaming behavior
+- token-count fixtures for API-adjacent counting paths that still need deterministic replay semantics
+
+Across those families, the shared contract preserves:
 
 - explicit activation in test posture
 - hash-based fixture naming from normalized inputs
@@ -45,13 +52,25 @@ Equivalent behavior should preserve:
 
 This is one of the most important stability contracts in the visible framework. It keeps network-backed tests deterministic and makes fixture refresh a deliberate maintenance act.
 
+## Recording lifecycle must stay deliberate
+
+Equivalent behavior should preserve:
+
+- replay as the default posture once a fixture exists
+- explicit record or refresh intent instead of incidental overwrites
+- the ability for different API-adjacent callers to reuse the same fixture policy rather than inventing lane-specific caching rules
+
+The important clean-room point is that recording is maintenance, not a side effect of ordinary CI execution.
+
 ## Transcript and hash stability matter
 
 The broader runtime also treats transcript shape as part of fixture stability.
 
 Equivalent behavior should preserve:
 
 - careful normalization before hashing
+- dehydration of machine-specific paths, config-home locations, and similar environment-local values
+- placeholder treatment for incidental UUIDs, timestamps, counters, and other unstable runtime identifiers
 - avoidance of unnecessary transcript-shape churn in replay-sensitive flows
 - deterministic identity or placeholder handling where raw runtime IDs would otherwise destabilize recordings
 
@@ -62,13 +81,15 @@ The visible testing architecture therefore depends on transcript semantics, not
 If a clean-room rebuild keeps external API-backed tests, it should preserve all of these:
 
 - a dedicated test posture
+- multiple fixture families when different API-adjacent callers need different oracle shapes
 - deterministic fixture hashing and hydration
 - fail-closed CI behavior for missing recordings
 - explicit recording refresh
 
 ## Failure modes
 
 - **test-production blur**: automated tests still emit nonessential production side effects
+- **layer collapse**: token-count or other API-adjacent lanes bypass the shared fixture policy and drift from replay behavior used elsewhere
 - **machine-bound fixtures**: path, cwd, or tempdir differences cause needless cache misses
 - **silent CI rewrite**: missing fixtures regenerate during CI and hide behavioral drift
 - **hash instability**: transcript or input normalization changes break recordings even when behavior did not meaningfully change
@@ -2,30 +2,34 @@
 title: "Test Framework Overview"
 owners: [bingran-you]
 soft_links:
+  - /reconstruction-guardrails/verification-and-native-test-oracles/test-runtime-mode-and-determinism.md
   - /reconstruction-guardrails/verification-and-native-test-oracles/test-environment-fixtures-and-ci-fail-closed-policy.md
+  - /reconstruction-guardrails/verification-and-native-test-oracles/test-lane-coverage-map.md
+  - /reconstruction-guardrails/verification-and-native-test-oracles/e2e-harness-reality-boundaries.md
   - /reconstruction-guardrails/verification-and-native-test-oracles/test-seams-reset-hooks-and-injected-dependencies.md
   - /reconstruction-guardrails/verification-and-native-test-oracles/native-test-derived-asset-provenance-and-acceptance-rules.md
   - /platform-services/mock-rate-limit-scenarios-and-test-contracts.md
   - /tools-and-permissions/filesystem-and-shell/sed-command-validation-contracts.md
+  - /tools-and-permissions/permissions/e2e-permission-testing-contracts.md
   - /tools-and-permissions/permissions/yolo-classifier-contracts.md
   - /platform-services/settings-schema-compatibility-and-invalid-field-preservation.md
   - /integrations/mcp/federated-auth-conformance-and-idp-test-seeding.md
 ---
 
 # Test Framework Overview
 
-The current Claude Code snapshot does not expose one self-contained `tests/` or runner manifest that answers everything. What it does expose is a layered testing architecture that spans runtime posture, fixtures, dedicated end-to-end harnesses, conformance-sensitive auth flows, and domain-owned contract oracles.
+The current Claude Code snapshot does not expose one self-contained `tests/` directory or runner manifest that answers everything. What it does expose is a layered testing architecture that spans runtime posture, fixtures, dedicated end-to-end harnesses, conformance-sensitive auth flows, and domain-owned contract oracles.
 
 ## Confirmed layers
 
-The snapshot clearly shows all of these verification layers:
+The snapshot provides direct signals for all of these verification layer families, even though it does not expose every upstream runner entrypoint:
 
 - a script-wrapped suite entry layer, because at least one compatibility contract is tied to a named `npm run test:file ...` path rather than to a raw helper invocation
 - ordinary module-level regression lanes, including `.test.ts`-style coverage
 - integration lanes, including `.int.test.ts` behavior for cross-component runtime state
 - end-to-end coverage for permission prompts and remote-control plumbing
 - conformance-sensitive auth coverage for federated MCP and XAA-style flows
-- runtime test posture via `NODE_ENV=test`
+- a supported test runtime posture via `NODE_ENV=test`
 - fixture and VCR-style replay for API-dependent scenarios
 - module-state isolation through exported reset, seed, and cleanup helpers for caches, watchers, registries, and other sticky services
 - domain-owned contract assets derived from upstream-native tests
@@ -42,17 +46,21 @@ A faithful rebuild should preserve these tiers as distinct concerns:
 
 Collapsing all of those into one broad suite would lose one of the main architectural signals in the current product: different behaviors are protected by different oracles.
 
+The subsystem mapping behind those tiers is spelled out in [test-lane-coverage-map.md](test-lane-coverage-map.md).
+
 ## Runner boundary
 
 The tree can safely claim:
 
 - there is a script-oriented entry layer
 - the product code is written to coexist with a Bun-flavored module-mocking environment
 - the visible framework depends on more than a generic "run tests" command
+- the end-to-end harnesses that are visible are designed to preserve real approval, transport, and credential paths rather than UI-only fakes
 
 The tree should not overclaim:
 
 - the exact full upstream runner manifest
+- the exhaustive upstream lane inventory
 - the complete CI orchestration or sharding plan
 - the full top-level command matrix for every lane