diff --git a/integrations/clients/ssh-remote-session-and-auth-proxy.md b/integrations/clients/ssh-remote-session-and-auth-proxy.md
index b014c4a..ca15d3e 100644
--- a/integrations/clients/ssh-remote-session-and-auth-proxy.md
+++ b/integrations/clients/ssh-remote-session-and-auth-proxy.md
@@ -1,7 +1,7 @@
 ---
 title: "SSH Remote Session and Auth Proxy"
 owners: []
-soft_links: [/integrations/clients/direct-connect-session-bootstrap-and-environment-selection.md, /integrations/clients/remote-session-message-adaptation-and-viewer-state.md, /product-surface/startup-entrypoint-routing-and-session-handoff.md, /platform-services/provider-specific-api-clients-and-auth-routing.md, /reconstruction-guardrails/verification-and-native-test-oracles/test-seams-reset-hooks-and-injected-dependencies.md]
+soft_links: [/integrations/clients/direct-connect-session-bootstrap-and-environment-selection.md, /integrations/clients/remote-session-message-adaptation-and-viewer-state.md, /product-surface/startup-entrypoint-routing-and-session-handoff.md, /platform-services/provider-specific-api-clients-and-auth-routing.md, /reconstruction-guardrails/verification-and-native-test-oracles/test-seams-reset-hooks-and-injected-dependencies.md, /reconstruction-guardrails/verification-and-native-test-oracles/e2e-harness-reality-boundaries.md]
 ---
 
 # SSH Remote Session and Auth Proxy
diff --git a/integrations/mcp/federated-auth-conformance-and-idp-test-seeding.md b/integrations/mcp/federated-auth-conformance-and-idp-test-seeding.md
index 54ed005..f93eda2 100644
--- a/integrations/mcp/federated-auth-conformance-and-idp-test-seeding.md
+++ b/integrations/mcp/federated-auth-conformance-and-idp-test-seeding.md
@@ -6,6 +6,7 @@ soft_links:
   - /integrations/mcp/server-contract.md
   - /platform-services/auth-config-and-policy.md
   - /reconstruction-guardrails/verification-and-native-test-oracles/test-seams-reset-hooks-and-injected-dependencies.md
+  - /reconstruction-guardrails/verification-and-native-test-oracles/e2e-harness-reality-boundaries.md
 ---
 
 # Federated Auth Conformance and IdP Test Seeding
diff --git a/platform-services/mock-rate-limit-scenarios-and-test-contracts.md b/platform-services/mock-rate-limit-scenarios-and-test-contracts.md
index 98eaa38..58bb48a 100644
--- a/platform-services/mock-rate-limit-scenarios-and-test-contracts.md
+++ b/platform-services/mock-rate-limit-scenarios-and-test-contracts.md
@@ -1,301 +1,77 @@
 ---
 title: "Mock Rate Limit Scenarios and Test Contracts"
 owners: [bingran-you]
-soft_links: [/platform-services/claude-ai-limits-and-extra-usage-state.md, /reconstruction-guardrails/verification-and-native-test-oracles/native-test-derived-asset-provenance-and-acceptance-rules.md]
+soft_links: [/platform-services/claude-ai-limits-and-extra-usage-state.md, /reconstruction-guardrails/verification-and-native-test-oracles/native-test-derived-asset-provenance-and-acceptance-rules.md, /reconstruction-guardrails/verification-and-native-test-oracles/test-lane-coverage-map.md]
 native_source: services/mockRateLimits.ts
 verification_status: native_test_derived
 ---
 
 # Mock Rate Limit Scenarios and Test Contracts
 
-This leaf documents the testable contracts for rate limit scenario simulation, extracted from the Claude Code source `services/mockRateLimits.ts` (883 lines). These contracts define the mock scenarios used for testing rate limit handling without hitting actual API limits.
+This leaf captures the internal mock surface used to verify Claude Code's rate-limit behavior without exhausting real quotas. The important clean-room value is not the exact helper API. It is the set of user-visible limit branches the upstream product considered important enough to simulate deterministically.
 
 ## Scope boundary
 
 This leaf covers:
 
-- Mock scenario definitions and their expected header configurations
-- Rate limit header types and valid values
-- Test contracts for scenario state transitions
-- Acceptance criteria for Python reconstruction
+- the scenario families that need deterministic simulation
+- the live-header semantics those scenarios must mimic
+- the reset and round-trip behavior needed for reliable verification
 
 It intentionally does not cover:
 
-- Actual API rate limit handling (see claude-ai-limits-and-extra-usage-state.md)
-- Error dialog UI (see ui-and-experience)
+- live quota enforcement or server policy
+- the user-facing quota state machine already captured in [claude-ai-limits-and-extra-usage-state.md](claude-ai-limits-and-extra-usage-state.md)
 
-## Access gate
+## Internal-only scenario surface
 
-All mock functions are gated by `process.env.USER_TYPE !== 'ant'`. In production contexts where `USER_TYPE` is not `ant`, mock functions return early without effect.
+Equivalent behavior should preserve an internal-only scenario surface rather than a public user feature.
 
-## MockScenario type contract
+That surface should stay able to simulate at least these families:
 
-The system defines 20 distinct mock scenarios:
+- a nominal allowed state
+- early-warning states before hard rejection
+- hard subscription-limit rejection
+- overage available, warning, and exhausted branches
+- exhausted-credit or disabled-credit branches for wallet, org, member, or seat-level causes
+- model-specific limit branches where one model family can be blocked independently of another
+- fast-mode cooldown or similar short-horizon retry behavior
+- an explicit clear or reset state that removes injected mock behavior
 
-```typescript
-type MockScenario =
-  | 'normal'                    // Normal usage, no limits
-  | 'session-limit-reached'     // 5-hour session limit exceeded
-  | 'approaching-weekly-limit'  // Warning: approaching 7-day limit
-  | 'weekly-limit-reached'      // 7-day aggregate limit exceeded
-  | 'overage-active'            // Using extra usage (overage available)
-  | 'overage-warning'           // Approaching extra usage limit
-  | 'overage-exhausted'         // Both subscription and overage exhausted
-  | 'out-of-credits'            // Wallet empty (out_of_credits)
-  | 'org-zero-credit-limit'     // Org spend cap is $0
-  | 'org-spend-cap-hit'         // Org monthly spend cap reached
-  | 'member-zero-credit-limit'  // Member individual limit is $0
-  | 'seat-tier-zero-credit-limit' // Seat tier limit is $0
-  | 'opus-limit'                // Opus model limit reached
-  | 'opus-warning'              // Approaching Opus limit
-  | 'sonnet-limit'              // Sonnet model limit reached
-  | 'sonnet-warning'            // Approaching Sonnet limit
-  | 'fast-mode-limit'           // Fast mode rate limit (>20s cooldown)
-  | 'fast-mode-short-limit'     // Fast mode rate limit (<20s cooldown)
-  | 'extra-usage-required'      // Headerless 429: 1M context requires extra usage
-  | 'clear'                     // Clear all mock headers
-```
+If a rebuild collapses those into one generic "rate limited" mock, it loses the nuance the upstream product was testing.
 
-## MockHeaders type contract
+## Header-shape fidelity matters
 
-Rate limit headers follow a unified naming convention:
+Equivalent behavior should preserve:
 
-```typescript
-type MockHeaders = {
-  'anthropic-ratelimit-unified-status'?: 'allowed' | 'allowed_warning' | 'rejected'
-  'anthropic-ratelimit-unified-reset'?: string  // Unix timestamp
-  'anthropic-ratelimit-unified-representative-claim'?:
-    'five_hour' | 'seven_day' | 'seven_day_opus' | 'seven_day_sonnet'
-  'anthropic-ratelimit-unified-overage-status'?: 'allowed' | 'allowed_warning' | 'rejected'
-  'anthropic-ratelimit-unified-overage-reset'?: string
-  'anthropic-ratelimit-unified-overage-disabled-reason'?: OverageDisabledReason
-  'anthropic-ratelimit-unified-fallback'?: 'available'
-  'anthropic-ratelimit-unified-fallback-percentage'?: string
-  'retry-after'?: string  // Seconds until reset
-  // Early warning utilization headers
-  'anthropic-ratelimit-unified-5h-utilization'?: string
-  'anthropic-ratelimit-unified-5h-reset'?: string
-  'anthropic-ratelimit-unified-5h-surpassed-threshold'?: string
-  'anthropic-ratelimit-unified-7d-utilization'?: string
-  'anthropic-ratelimit-unified-7d-reset'?: string
-  'anthropic-ratelimit-unified-7d-surpassed-threshold'?: string
-  'anthropic-ratelimit-unified-overage-utilization'?: string
-  'anthropic-ratelimit-unified-overage-surpassed-threshold'?: string
-}
-```
+- the mock surface driving the same downstream parsing and messaging branches as live quota responses
+- status families that distinguish ordinary allowance, warning, rejection, and overage state
+- representative claim and reset-time semantics that keep user messaging aligned with the active scenario
+- disabled-reason variants that let downstream logic distinguish empty wallet, org cap, member cap, seat cap, and similar causes
+- early-warning utilization signals that can be simulated separately from hard rejection
 
-## OverageDisabledReason values
+The exact helper names are implementation detail. What matters in the tree is that the mock data be shaped closely enough that downstream logic cannot tell whether it came from a live response or a controlled verification scenario.
 
-```typescript
-type OverageDisabledReason =
-  | 'out_of_credits'              // Wallet is empty
-  | 'org_service_zero_credit_limit' // Org-level spend cap is $0
-  | 'org_level_disabled_until'    // Org monthly cap hit
-  | 'member_zero_credit_limit'    // Member limit is $0
-  | 'seat_tier_zero_credit_limit' // Seat tier limit is $0
-```
+## Scenario round-trip and reset behavior
 
-## Testable function contracts
+Equivalent behavior should preserve:
 
-### `setMockRateLimitScenario(scenario: MockScenario): void`
+- deterministic selection of each major user-visible quota branch
+- the ability to clear injected state cleanly between cases
+- stable scenario-to-visible-branch mapping, so acceptance tests can state which quota path they are exercising without depending on live quotas
+- reset semantics that keep retry timing and representative-claim behavior coherent rather than leaving stale mock residue behind
 
-Sets up a predefined rate limit scenario with appropriate headers.
+## Reconstruction rule
 
-**Scenario test cases**:
+A clean-room rebuild should preserve:
 
-| Scenario | status | overage-status | claim | disabled-reason |
-|----------|--------|----------------|-------|-----------------|
-| `normal` | `allowed` | - | - | - |
-| `session-limit-reached` | `rejected` | - | `five_hour` | - |
-| `approaching-weekly-limit` | `allowed_warning` | - | `seven_day` | - |
-| `weekly-limit-reached` | `rejected` | - | `seven_day` | - |
-| `overage-active` | `rejected` | `allowed` | `five_hour`* | - |
-| `overage-warning` | `rejected` | `allowed_warning` | `five_hour`* | - |
-| `overage-exhausted` | `rejected` | `rejected` | `five_hour`* | - |
-| `out-of-credits` | `rejected` | `rejected` | `five_hour`* | `out_of_credits` |
-| `org-zero-credit-limit` | `rejected` | `rejected` | `five_hour`* | `org_service_zero_credit_limit` |
-| `org-spend-cap-hit` | `rejected` | `rejected` | `five_hour`* | `org_level_disabled_until` |
-| `member-zero-credit-limit` | `rejected` | `rejected` | `five_hour`* | `member_zero_credit_limit` |
-| `seat-tier-zero-credit-limit` | `rejected` | `rejected` | `five_hour`* | `seat_tier_zero_credit_limit` |
-| `opus-limit` | `rejected` | - | `seven_day_opus` | - |
-| `opus-warning` | `allowed_warning` | - | `seven_day_opus` | - |
-| `sonnet-limit` | `rejected` | - | `seven_day_sonnet` | - |
-| `sonnet-warning` | `allowed_warning` | - | `seven_day_sonnet` | - |
+- internal-only admission for mock quota simulation
+- coverage across the major quota and overage branches users actually experience
+- mock data shaped like live quota inputs rather than bespoke fake objects
 
-\* Default claim when no exceeded limits are set; preserves existing exceeded limits for overage scenarios.
+## Failure modes
 
-### `setMockHeader(key: MockHeaderKey, value: string | undefined): void`
-
-Sets individual mock headers with automatic handling.
-
-**Contract**:
-- Keys are mapped to full header names (`status` → `anthropic-ratelimit-unified-status`)
-- `retry-after` is not prefixed
-- Setting `undefined` or `'clear'` removes the header
-- Setting `reset` or `overage-reset` with a number treats it as hours from now
-- Setting `claim` adds to exceeded limits and updates representative claim
-- `retry-after` is auto-calculated when status changes to `rejected`
-
-**Test cases**:
-```
-SET: setMockHeader('status', 'rejected')
-  -> mockHeaders['anthropic-ratelimit-unified-status'] = 'rejected'
-
-SET: setMockHeader('reset', '5')
-  -> mockHeaders['anthropic-ratelimit-unified-reset'] = String(now + 5*3600)
-
-SET: setMockHeader('claim', 'five_hour')
-  -> exceededLimits = [{ type: 'five_hour', resetsAt: now + 5*3600 }]
-  -> mockHeaders['anthropic-ratelimit-unified-representative-claim'] = 'five_hour'
-
-CLEAR: setMockHeader('status', undefined)
-  -> delete mockHeaders['anthropic-ratelimit-unified-status']
-```
-
-### `addExceededLimit(type, hoursFromNow): void`
-
-Adds an exceeded limit with custom reset time.
-
-**Contract**:
-- `type`: `'five_hour' | 'seven_day' | 'seven_day_opus' | 'seven_day_sonnet'`
-- Sets status to `rejected` if limits exist
-- Updates representative claim to furthest reset time
-
-**Test cases**:
-```
-addExceededLimit('five_hour', 4)
-  -> exceededLimits includes { type: 'five_hour', resetsAt: now + 4*3600 }
-  -> status = 'rejected'
-  -> representative-claim = 'five_hour'
-
-addExceededLimit('seven_day', 120)
-  -> exceededLimits includes { type: 'seven_day', resetsAt: now + 120*3600 }
-  -> representative-claim = 'seven_day' (if furthest)
-```
-
-### `setMockEarlyWarning(claimAbbrev, utilization, hoursFromNow?): void`
-
-Sets mock early warning utilization headers.
-
-**Contract**:
-- `claimAbbrev`: `'5h' | '7d' | 'overage'`
-- Clears ALL early warning headers first (5h is checked before 7d)
-- Sets `utilization`, `reset`, and `surpassed-threshold` headers
-- Sets status to `allowed` if not already set
-
-**Test cases**:
-```
-setMockEarlyWarning('5h', 0.92)
-  -> '5h-utilization' = '0.92'
-  -> '5h-reset' = String(now + 4*3600)  // default 4 hours
-  -> '5h-surpassed-threshold' = '0.92'
-  -> status = 'allowed' (if not set)
-
-setMockEarlyWarning('7d', 0.85, 48)
-  -> '7d-utilization' = '0.85'
-  -> '7d-reset' = String(now + 48*3600)
-  -> '7d-surpassed-threshold' = '0.85'
-```
-
-### `getCurrentMockScenario(): MockScenario | null`
-
-Reverse-lookups the current scenario from active headers.
-
-**Contract** (lookup priority):
-1. If `claim` is `seven_day_opus`: return `opus-limit` or `opus-warning`
-2. If `claim` is `seven_day_sonnet`: return `sonnet-limit` or `sonnet-warning`
-3. If `overage-status` is `rejected`: return `overage-exhausted`
-4. If `overage-status` is `allowed_warning`: return `overage-warning`
-5. If `overage-status` is `allowed`: return `overage-active`
-6. If `status` is `rejected`:
-   - `claim` is `five_hour`: return `session-limit-reached`
-   - `claim` is `seven_day`: return `weekly-limit-reached`
-7. If `status` is `allowed_warning` and `claim` is `seven_day`: return `approaching-weekly-limit`
-8. If `status` is `allowed`: return `normal`
-9. Otherwise: return `null`
-
-### `getScenarioDescription(scenario: MockScenario): string`
-
-Returns human-readable description for each scenario.
-
-**Test cases**:
-```
-getScenarioDescription('normal') -> 'Normal usage, no limits'
-getScenarioDescription('session-limit-reached') -> 'Session rate limit exceeded'
-getScenarioDescription('overage-exhausted') -> 'Both subscription and extra usage limits exhausted'
-getScenarioDescription('out-of-credits') -> 'Out of extra usage credits (wallet empty)'
-getScenarioDescription('opus-limit') -> 'Opus limit reached'
-getScenarioDescription('extra-usage-required') -> 'Headerless 429: Extra usage required for 1M context'
-```
-
-### `checkMockFastModeRateLimit(isFastModeActive?: boolean): MockHeaders | null`
-
-Checks and returns mock headers for fast mode rate limits.
-
-**Contract**:
-- Returns `null` if no fast mode limit is configured
-- Returns `null` if `isFastModeActive` is false
-- Returns `null` if rate limit has expired
-- On first error, sets expiry timestamp
-- Calculates dynamic `retry-after` based on remaining time
-
-### `applyMockHeaders(headers: Headers): Headers`
-
-Applies mock headers to an existing Headers object.
-
-**Contract**:
-- Returns original headers if mocking is disabled
-- Creates new Headers object with originals
-- Overwrites with mock headers
-- Returns modified Headers
-
-## Reset time conventions
-
-| Limit Type | Default Reset |
-|------------|---------------|
-| `five_hour` | 5 hours from now |
-| `seven_day` | 7 days from now |
-| `seven_day_opus` | 7 days from now |
-| `seven_day_sonnet` | 7 days from now |
-| overage | End of current month |
-
-## Reconstruction guidance
-
-A Python reconstruction should:
-
-1. Define `MockScenario` enum with all 20 scenarios
-2. Define `MockHeaders` dataclass with all header fields
-3. Define `OverageDisabledReason` enum with 5 reasons
-4. Implement `set_mock_rate_limit_scenario(scenario)`:
-   - Map each scenario to correct header configuration
-   - Handle exceeded limits list for overage scenarios
-   - Calculate reset timestamps
-5. Implement `set_mock_header(key, value)`:
-   - Map abbreviated keys to full header names
-   - Handle `reset`/`overage-reset` time calculation
-   - Handle `claim` with exceeded limits tracking
-   - Auto-calculate `retry-after` on status changes
-6. Implement `add_exceeded_limit(type, hours_from_now)`:
-   - Add to exceeded limits list
-   - Update representative claim
-   - Set status to rejected
-7. Implement `set_mock_early_warning(claim_abbrev, utilization, hours?)`:
-   - Clear all early warning headers first
-   - Set utilization, reset, and surpassed-threshold
-8. Implement `get_current_mock_scenario()`:
-   - Reverse lookup scenario from headers
-   - Follow priority order exactly
-9. Implement `apply_mock_headers(headers)`:
-   - Return new headers with mocks applied
-
-## Acceptance criteria
-
-- [ ] All 20 scenarios produce correct header configurations
-- [ ] `setMockHeader` correctly maps keys and handles special cases
-- [ ] `addExceededLimit` updates representative claim to furthest reset
-- [ ] `setMockEarlyWarning` clears existing early warning headers first
-- [ ] `getCurrentMockScenario` correctly reverses all scenarios
-- [ ] `getScenarioDescription` returns correct descriptions for all scenarios
-- [ ] Fast mode rate limit expiry and dynamic retry-after work correctly
-- [ ] All functions are gated by USER_TYPE check
-- [ ] Reset time calculations use correct offsets (5h, 7d, end of month)
+- **public harness leak**: users can access internal quota simulation in ordinary product operation
+- **branch collapse**: distinct warning, overage, model-specific, or disabled-credit states all map to one generic mock
+- **header drift**: mock responses no longer exercise the same downstream parsing branches as live quota data
+- **stale residue**: one scenario leaves behind timing or claim state that corrupts the next test
diff --git a/reconstruction-guardrails/verification-and-native-test-oracles/NODE.md b/reconstruction-guardrails/verification-and-native-test-oracles/NODE.md
index 8127895..9e1b46f 100644
--- a/reconstruction-guardrails/verification-and-native-test-oracles/NODE.md
+++ b/reconstruction-guardrails/verification-and-native-test-oracles/NODE.md
@@ -16,7 +16,10 @@ This subdomain captures cross-cutting knowledge about how the observed Claude Co
 Relevant leaves:
 
 - **[test-framework-overview.md](test-framework-overview.md)** — The layered shape of the current test system, including the visible tier model and the boundary between confirmed and inferred runner details.
+- **[test-runtime-mode-and-determinism.md](test-runtime-mode-and-determinism.md)** — How `NODE_ENV=test` behaves as a supported runtime posture, including in-memory config behavior, reduced side effects, and deterministic test-only branches.
 - **[test-environment-fixtures-and-ci-fail-closed-policy.md](test-environment-fixtures-and-ci-fail-closed-policy.md)** — How test posture suppresses side effects, how fixture replay works, and why missing recordings fail closed in CI.
+- **[test-lane-coverage-map.md](test-lane-coverage-map.md)** — Which subsystem contracts are guarded by fast regression, integration, end-to-end, conformance, and compatibility lanes, without overclaiming the hidden runner layout.
+- **[e2e-harness-reality-boundaries.md](e2e-harness-reality-boundaries.md)** — Which end-to-end harnesses may shorten setup but still need to preserve real permission, transport, auth-proxy, and credential-cache paths.
 - **[test-seams-reset-hooks-and-injected-dependencies.md](test-seams-reset-hooks-and-injected-dependencies.md)** — The narrow seams the product uses to keep hard behaviors testable without turning the whole runtime into a debug harness.
 - **[native-test-derived-asset-provenance-and-acceptance-rules.md](native-test-derived-asset-provenance-and-acceptance-rules.md)** — How native test knowledge should be normalized into clean-room contract assets and how those assets should be linked back to their owning domains.
 - **[evidence-levels-and-missing-artifacts.md](evidence-levels-and-missing-artifacts.md)** — What this source snapshot proves, what it only strongly suggests, and which missing artifacts still block exact runner-level reproduction.
diff --git a/reconstruction-guardrails/verification-and-native-test-oracles/e2e-harness-reality-boundaries.md b/reconstruction-guardrails/verification-and-native-test-oracles/e2e-harness-reality-boundaries.md
new file mode 100644
index 0000000..63008e6
--- /dev/null
+++ b/reconstruction-guardrails/verification-and-native-test-oracles/e2e-harness-reality-boundaries.md
@@ -0,0 +1,61 @@
+---
+title: "E2E Harness Reality Boundaries"
+owners: [bingran-you]
+soft_links:
+  - /reconstruction-guardrails/verification-and-native-test-oracles/test-lane-coverage-map.md
+  - /reconstruction-guardrails/verification-and-native-test-oracles/test-seams-reset-hooks-and-injected-dependencies.md
+  - /tools-and-permissions/permissions/e2e-permission-testing-contracts.md
+  - /ui-and-experience/dialogs-and-approvals/permission-prompt-shell-and-worker-states.md
+  - /integrations/clients/ssh-remote-session-and-auth-proxy.md
+  - /integrations/mcp/federated-auth-conformance-and-idp-test-seeding.md
+  - /runtime-orchestration/sessions/session-artifacts-and-sharing.md
+---
+
+# E2E Harness Reality Boundaries
+
+The observed Claude Code snapshot uses several harnesses that shorten setup cost without abandoning the real runtime path. That distinction matters: a clean-room rebuild should preserve which parts of end-to-end verification are allowed to be synthetic and which parts still need to exercise production-like orchestration.
+
+## Approval harnesses must still use the real permission path
+
+Equivalent behavior should preserve:
+
+- a narrow approval-oriented harness that can force the permission flow to appear on demand
+- that harness still entering through the normal tool catalog, permission-decision pipeline, and permission-prompt shell
+- grant, deny, cancel, queue-advance, and worker-forwarding behavior being validated through the same UI and callback machinery users actually see
+
+An e2e approval test that injects dialog state directly is no longer testing the product contract that matters.
+
+## Remote transport harnesses may skip deployment, not orchestration
+
+Equivalent behavior should preserve:
+
+- a local harness mode that can avoid real SSH deployment when the test only needs to verify split local-UI and remote-execution plumbing
+- the auth proxy, transcript adaptation, permission relay, and session-lifecycle machinery still being exercised
+- failures and reconnect behavior still traveling through the real remote-session contract rather than a fake one-shot shell wrapper
+
+The shortcut is allowed to reduce environment setup. It is not allowed to erase the transport boundary being tested.
+
+## Federated-auth harnesses may skip browser setup, not credential semantics
+
+Equivalent behavior should preserve:
+
+- a deterministic way to seed federated credentials when a mock identity provider does not expose the full interactive browser surface
+- seeded credentials landing in the same secure cache slot the ordinary login and refresh paths later read
+- downstream exchange, refresh, and revocation behavior still using the normal federated auth path
+
+Otherwise the test stops proving interoperability and starts proving only that a bypass slot was written successfully.
+
+## Session-state harnesses may seed artifacts, not invent a separate resume model
+
+Equivalent behavior should preserve:
+
+- targeted setters or seed helpers for session artifact state when bootstrapping a full prior session would be too expensive
+- those helpers still feeding the same transcript, artifact, and resume semantics production uses
+- test convenience never becoming a second, incompatible persistence model
+
+## Failure modes
+
+- **fake dialog coverage**: permission tests manipulate UI state directly and stop covering the real approval pipeline
+- **transport collapse**: a local remote-session harness stops exercising proxying, relay, or transcript adaptation
+- **credential bypass**: federated auth tests seed a token into a cache path the real login flow never reads
+- **shadow persistence**: test setters create a second resume model unrelated to the live session artifact system
diff --git a/reconstruction-guardrails/verification-and-native-test-oracles/evidence-levels-and-missing-artifacts.md b/reconstruction-guardrails/verification-and-native-test-oracles/evidence-levels-and-missing-artifacts.md
index 812c745..e0364d8 100644
--- a/reconstruction-guardrails/verification-and-native-test-oracles/evidence-levels-and-missing-artifacts.md
+++ b/reconstruction-guardrails/verification-and-native-test-oracles/evidence-levels-and-missing-artifacts.md
@@ -5,6 +5,7 @@ soft_links:
   - /reconstruction-guardrails/source-boundary.md
   - /reconstruction-guardrails/knowledge-lifecycle.md
   - /reconstruction-guardrails/verification-and-native-test-oracles/test-framework-overview.md
+  - /reconstruction-guardrails/verification-and-native-test-oracles/test-lane-coverage-map.md
 ---
 
 # Evidence Levels and Missing Artifacts
@@ -15,16 +16,19 @@ This repository should distinguish between what the current source snapshot prov
 
 The snapshot is sufficient to confirm all of these:
 
-- there are distinct unit or regression, integration, end-to-end, conformance, and compatibility lanes
-- `NODE_ENV=test` is a real runtime posture
+- `NODE_ENV=test` is a supported runtime posture rather than a one-off conditional
 - fixture and VCR replay are first-class testing mechanisms
+- there are direct signals for multiple lane families, including at least one compatibility lane, at least one integration lane, dedicated end-to-end harnesses, conformance-sensitive auth verification, and many narrow regression or fidelity oracles
 - narrow seams such as injected dependencies, exported testing helpers, resets, and test-only helper surfaces are part of the current design
 
+The tree should treat those as lane-family and architecture facts, not as proof of the full hidden runner inventory.
+
 ## Strongly suggested but not fully proven
 
 The tree can safely treat these as strong signals rather than as closed facts:
 
 - the TypeScript runner environment is Bun-oriented in at least part of the stack
+- the regression or unit layer is broader than the few directly named test references exposed in comments and helper exports
 - repo-level scripts wrap at least some runner commands instead of every lane being invoked directly
 
 ## Still missing for exact runner-level reproduction
@@ -33,6 +37,7 @@ The current snapshot does not fully expose:
 
 - the top-level repository manifest and script table
 - the complete test directory layout
+- the exhaustive lane inventory and lane-to-command matrix
 - the full committed fixture corpus
 - the CI workflow and any sharding or coverage rules
 
@@ -44,6 +49,7 @@ While those artifacts are missing, the tree should:
 
 - document the confirmed architecture and tier model
 - preserve clear evidence labels for inferred versus confirmed details
+- claim lane purpose and behavior ownership more confidently than lane naming or runner wiring
 - refuse to guess exact runner wiring that the snapshot did not show
 
 This is a knowledge-quality rule, not a refusal to make progress. The visible framework is already rich enough to guide a clean-room rebuild of the verification architecture itself.
diff --git a/reconstruction-guardrails/verification-and-native-test-oracles/test-environment-fixtures-and-ci-fail-closed-policy.md b/reconstruction-guardrails/verification-and-native-test-oracles/test-environment-fixtures-and-ci-fail-closed-policy.md
index b501871..f4ccdb3 100644
--- a/reconstruction-guardrails/verification-and-native-test-oracles/test-environment-fixtures-and-ci-fail-closed-policy.md
+++ b/reconstruction-guardrails/verification-and-native-test-oracles/test-environment-fixtures-and-ci-fail-closed-policy.md
@@ -2,6 +2,7 @@
 title: "Test Environment, Fixtures, and CI Fail-Closed Policy"
 owners: [bingran-you]
 soft_links:
+  - /reconstruction-guardrails/verification-and-native-test-oracles/test-runtime-mode-and-determinism.md
   - /platform-services/startup-service-sequencing-and-capability-gates.md
   - /platform-services/usage-analytics-and-migrations.md
   - /integrations/clients/structured-io-and-headless-session-loop.md
@@ -25,9 +26,15 @@ The important point is not one specific branch. It is that the runtime treats te
 
 ## Fixture replay is a first-class oracle
 
-The snapshot exposes a VCR-style replay layer for API-dependent behavior.
+The snapshot exposes more than one fixture family for API-adjacent behavior.
 
-That layer preserves:
+Equivalent behavior should preserve:
+
+- a generic fixture helper for deterministic caching of arbitrary expensive or externalized test oracles
+- message-replay fixtures for API response and streaming behavior
+- token-count fixtures for API-adjacent counting paths that still need deterministic replay semantics
+
+Across those families, the shared contract preserves:
 
 - explicit activation in test posture
 - hash-based fixture naming from normalized inputs
@@ -45,6 +52,16 @@ Equivalent behavior should preserve:
 
 This is one of the most important stability contracts in the visible framework. It keeps network-backed tests deterministic and makes fixture refresh a deliberate maintenance act.
 
+## Recording lifecycle must stay deliberate
+
+Equivalent behavior should preserve:
+
+- replay as the default posture once a fixture exists
+- explicit record or refresh intent instead of incidental overwrites
+- the ability for different API-adjacent callers to reuse the same fixture policy rather than inventing lane-specific caching rules
+
+The important clean-room point is that recording is maintenance, not a side effect of ordinary CI execution.
+
 ## Transcript and hash stability matter
 
 The broader runtime also treats transcript shape as part of fixture stability.
@@ -52,6 +69,8 @@ The broader runtime also treats transcript shape as part of fixture stability.
 Equivalent behavior should preserve:
 
 - careful normalization before hashing
+- dehydration of machine-specific paths, config-home locations, and similar environment-local values
+- placeholder treatment for incidental UUIDs, timestamps, counters, and other unstable runtime identifiers
 - avoidance of unnecessary transcript-shape churn in replay-sensitive flows
 - deterministic identity or placeholder handling where raw runtime IDs would otherwise destabilize recordings
 
@@ -62,6 +81,7 @@ The visible testing architecture therefore depends on transcript semantics, not
 If a clean-room rebuild keeps external API-backed tests, it should preserve all of these:
 
 - a dedicated test posture
+- multiple fixture families when different API-adjacent callers need different oracle shapes
 - deterministic fixture hashing and hydration
 - fail-closed CI behavior for missing recordings
 - explicit recording refresh
@@ -69,6 +89,7 @@ If a clean-room rebuild keeps external API-backed tests, it should preserve all
 ## Failure modes
 
 - **test-production blur**: automated tests still emit nonessential production side effects
+- **layer collapse**: token-count or other API-adjacent lanes bypass the shared fixture policy and drift from replay behavior used elsewhere
 - **machine-bound fixtures**: path, cwd, or tempdir differences cause needless cache misses
 - **silent CI rewrite**: missing fixtures regenerate during CI and hide behavioral drift
 - **hash instability**: transcript or input normalization changes break recordings even when behavior did not meaningfully change
diff --git a/reconstruction-guardrails/verification-and-native-test-oracles/test-framework-overview.md b/reconstruction-guardrails/verification-and-native-test-oracles/test-framework-overview.md
index 086c524..e1b5fdc 100644
--- a/reconstruction-guardrails/verification-and-native-test-oracles/test-framework-overview.md
+++ b/reconstruction-guardrails/verification-and-native-test-oracles/test-framework-overview.md
@@ -2,11 +2,15 @@
 title: "Test Framework Overview"
 owners: [bingran-you]
 soft_links:
+  - /reconstruction-guardrails/verification-and-native-test-oracles/test-runtime-mode-and-determinism.md
   - /reconstruction-guardrails/verification-and-native-test-oracles/test-environment-fixtures-and-ci-fail-closed-policy.md
+  - /reconstruction-guardrails/verification-and-native-test-oracles/test-lane-coverage-map.md
+  - /reconstruction-guardrails/verification-and-native-test-oracles/e2e-harness-reality-boundaries.md
   - /reconstruction-guardrails/verification-and-native-test-oracles/test-seams-reset-hooks-and-injected-dependencies.md
   - /reconstruction-guardrails/verification-and-native-test-oracles/native-test-derived-asset-provenance-and-acceptance-rules.md
   - /platform-services/mock-rate-limit-scenarios-and-test-contracts.md
   - /tools-and-permissions/filesystem-and-shell/sed-command-validation-contracts.md
+  - /tools-and-permissions/permissions/e2e-permission-testing-contracts.md
   - /tools-and-permissions/permissions/yolo-classifier-contracts.md
   - /platform-services/settings-schema-compatibility-and-invalid-field-preservation.md
   - /integrations/mcp/federated-auth-conformance-and-idp-test-seeding.md
@@ -14,18 +18,18 @@ soft_links:
 
 # Test Framework Overview
 
-The current Claude Code snapshot does not expose one self-contained `tests/` or runner manifest that answers everything. What it does expose is a layered testing architecture that spans runtime posture, fixtures, dedicated end-to-end harnesses, conformance-sensitive auth flows, and domain-owned contract oracles.
+The current Claude Code snapshot does not expose one self-contained `tests/` directory or runner manifest that answers everything. What it does expose is a layered testing architecture that spans runtime posture, fixtures, dedicated end-to-end harnesses, conformance-sensitive auth flows, and domain-owned contract oracles.
 
 ## Confirmed layers
 
-The snapshot clearly shows all of these verification layers:
+The snapshot provides direct signals for all of these verification layer families, even though it does not expose every upstream runner entrypoint:
 
 - a script-wrapped suite entry layer, because at least one compatibility contract is tied to a named `npm run test:file ...` path rather than to a raw helper invocation
 - ordinary module-level regression lanes, including `.test.ts`-style coverage
 - integration lanes, including `.int.test.ts` behavior for cross-component runtime state
 - end-to-end coverage for permission prompts and remote-control plumbing
 - conformance-sensitive auth coverage for federated MCP and XAA-style flows
-- runtime test posture via `NODE_ENV=test`
+- a supported test runtime posture via `NODE_ENV=test`
 - fixture and VCR-style replay for API-dependent scenarios
 - module-state isolation through exported reset, seed, and cleanup helpers for caches, watchers, registries, and other sticky services
 - domain-owned contract assets derived from upstream-native tests
@@ -42,6 +46,8 @@ A faithful rebuild should preserve these tiers as distinct concerns:
 
 Collapsing all of those into one broad suite would lose one of the main architectural signals in the current product: different behaviors are protected by different oracles.
 
+The subsystem mapping behind those tiers is spelled out in [test-lane-coverage-map.md](test-lane-coverage-map.md).
+
 ## Runner boundary
 
 The tree can safely claim:
@@ -49,10 +55,12 @@ The tree can safely claim:
 - there is a script-oriented entry layer
 - the product code is written to coexist with a Bun-flavored module-mocking environment
 - the visible framework depends on more than a generic "run tests" command
+- the end-to-end harnesses that are visible are designed to preserve real approval, transport, and credential paths rather than UI-only fakes
 
 The tree should not overclaim:
 
 - the exact full upstream runner manifest
+- the exhaustive upstream lane inventory
 - the complete CI orchestration or sharding plan
 - the full top-level command matrix for every lane
 
diff --git a/reconstruction-guardrails/verification-and-native-test-oracles/test-lane-coverage-map.md b/reconstruction-guardrails/verification-and-native-test-oracles/test-lane-coverage-map.md
new file mode 100644
index 0000000..cba6ec8
--- /dev/null
+++ b/reconstruction-guardrails/verification-and-native-test-oracles/test-lane-coverage-map.md
@@ -0,0 +1,78 @@
+---
+title: "Test Lane Coverage Map"
+owners: [bingran-you]
+soft_links:
+  - /reconstruction-guardrails/verification-and-native-test-oracles/test-framework-overview.md
+  - /reconstruction-guardrails/verification-and-native-test-oracles/e2e-harness-reality-boundaries.md
+  - /platform-services/settings-schema-compatibility-and-invalid-field-preservation.md
+  - /platform-services/settings-change-detection-and-runtime-reload.md
+  - /runtime-orchestration/sessions/session-artifacts-and-sharing.md
+  - /tools-and-permissions/permissions/e2e-permission-testing-contracts.md
+  - /integrations/clients/ssh-remote-session-and-auth-proxy.md
+  - /integrations/mcp/federated-auth-conformance-and-idp-test-seeding.md
+---
+
+# Test Lane Coverage Map
+
+The current snapshot does not expose a full runner manifest, but it does show that Claude Code protects different behavior families with different verification lanes. A clean-room rebuild should preserve that mapping even if the exact filenames, commands, or CI layout differ.
+
+## Fast regression and unit-like lanes
+
+The visible fast lanes protect narrow, local contracts such as:
+
+- parser and serializer edge cases
+- shell and permission safety heuristics
+- transcript-search or render-fidelity extraction boundaries
+- sticky singleton cleanup and helper-state reset behavior
+
+These lanes should stay cheap, isolated, and able to run without the full product startup graph.
+
+## Integration lanes
+
+The visible integration-oriented lanes protect cross-component runtime behavior such as:
+
+- startup sequencing and async service readiness
+- managed settings cache visibility and hot-reload invalidation
+- watcher and promise state that can be poisoned by one subsystem and observed by another
+- resume-sensitive session artifacts and related persistence boundaries
+
+These lanes need more real runtime wiring than a pure regression test, but they still stop short of a full user-facing end-to-end environment.
+
+## End-to-end harness lanes
+
+The visible end-to-end lanes protect workflows where the real orchestration path matters, including:
+
+- permission prompt routing and user decision flow
+- worker or remote approval forwarding
+- SSH or remote-control plumbing where local UI and remote execution are split
+- auth-proxy and transcript-adaptation behavior that spans transport boundaries
+
+These lanes matter because mock-only verification would miss the very orchestration contracts users experience.
+
+## Conformance lanes
+
+The visible conformance-sensitive lanes protect interoperability contracts where a server, provider, or standard expects one particular wire behavior.
+
+The clearest current example is federated MCP auth, where token-exchange method, credential reuse, and seeded test credentials must still match the real downstream exchange path.
+
+## Compatibility lanes
+
+The visible compatibility lanes protect durable public formats rather than transient runtime internals.
+
+The clearest current example is settings evolution, where additive schema change, invalid-field preservation, and backward compatibility must remain guarded even as the runtime evolves.
+
+## Reconstruction rule
+
+A faithful rebuild should preserve:
+
+- different lane families for different risk profiles
+- the mapping from lane family to subsystem contract
+- stronger realism for approval, transport, and provider-sensitive flows than for narrow parser or serializer regressions
+
+It does not need to preserve the exact upstream command names when those were not directly visible in the source snapshot.
+
+## Evidence boundary
+
+The lane purposes above are more certain than the hidden runner inventory.
+
+The tree can safely describe what kinds of behavior each lane family protects. It should be more careful when claiming the exact top-level command matrix, exhaustive test file layout, or CI sharding strategy.
diff --git a/reconstruction-guardrails/verification-and-native-test-oracles/test-runtime-mode-and-determinism.md b/reconstruction-guardrails/verification-and-native-test-oracles/test-runtime-mode-and-determinism.md
new file mode 100644
index 0000000..8239e30
--- /dev/null
+++ b/reconstruction-guardrails/verification-and-native-test-oracles/test-runtime-mode-and-determinism.md
@@ -0,0 +1,73 @@
+---
+title: "Test Runtime Mode and Determinism"
+owners: [bingran-you]
+soft_links:
+  - /reconstruction-guardrails/verification-and-native-test-oracles/test-environment-fixtures-and-ci-fail-closed-policy.md
+  - /reconstruction-guardrails/verification-and-native-test-oracles/test-seams-reset-hooks-and-injected-dependencies.md
+  - /platform-services/startup-service-sequencing-and-capability-gates.md
+  - /platform-services/usage-analytics-and-migrations.md
+  - /tools-and-permissions/tool-catalog/tool-pool-assembly.md
+  - /tools-and-permissions/permissions/e2e-permission-testing-contracts.md
+  - /ui-and-experience/dialogs-and-approvals/permission-prompt-shell-and-worker-states.md
+---
+
+# Test Runtime Mode and Determinism
+
+`NODE_ENV=test` is not just a hint for one subsystem. In the observed Claude Code snapshot it acts as a supported runtime posture with its own rules for config state, background work, helper-surface admission, and deterministic output.
+
+## Test mode owns config isolation
+
+Equivalent behavior should preserve:
+
+- global and project-scoped config mutations being able to live in memory during automated runs instead of forcing persistent on-disk writes
+- read paths being able to return that in-memory state directly under test posture
+- config freshness watchers and similar cross-process sync helpers staying out of the way during ordinary tests unless a test intentionally exercises them
+
+This matters because the product otherwise has many persistent caches and watch paths that would make tests order-dependent or racy.
+
+## Background fetches and startup enrichments can go quiet
+
+Equivalent behavior should preserve:
+
+- remote or experiment-driven config hooks being able to short-circuit under test posture instead of waiting on asynchronous fetches
+- startup-only enrichments and best-effort service work being suppressible when their production value is nondeterministic test noise
+- tests not being forced to boot the full live startup graph when the lane is only trying to validate a narrow subsystem contract
+
+The goal is not to disable the product. The goal is to keep automated runs from blocking on unrelated network or environment dependencies.
+
+## Nonessential side effects stay suppressed
+
+Equivalent behavior should preserve:
+
+- telemetry and feedback-style emissions being suppressed in automated runs
+- exit-time bookkeeping and adjacent support traffic being suppressible in test posture
+- other background effects that only matter in production being able to stay dormant during verification
+
+This suppression is part of determinism. It keeps tests from coupling themselves to slow, flaky, or privacy-sensitive side channels.
+
+## Test-only helper surfaces can be admitted narrowly
+
+Equivalent behavior should preserve:
+
+- narrow helper surfaces that are admitted only in test posture
+- those helpers still entering through the normal runtime assembly path rather than bypassing tool registration or permission routing entirely
+- a clear difference between a test harness surface and an end-user product capability
+
+This is how the current design can expose purpose-built harnesses without turning them into public features.
+
+## Deterministic branches are allowed when production heuristics are unstable
+
+Equivalent behavior should preserve:
+
+- stable ordering in places where production normally uses recency, filesystem timing, or other environment-sensitive heuristics
+- render or scheduling branches that make test output observable immediately and consistently
+- deterministic normalization where incidental IDs, timestamps, or path forms would otherwise churn acceptance outputs
+
+Not every tiny tie-breaker belongs in the tree. The product-level rule is that test posture may replace unstable heuristics with deterministic ones when that preserves the same user-visible contract.
+
+## Failure modes
+
+- **disk-coupled tests**: automated runs mutate persistent config or shared caches and then influence later tests
+- **startup drag**: a narrow regression lane still waits on unrelated remote config or background init
+- **helper leak**: a test-only helper surface escapes test posture and becomes user-visible
+- **nondeterministic output**: sort order, rendering, or timing-sensitive branches make acceptance outputs depend on machine state
diff --git a/reconstruction-guardrails/verification-and-native-test-oracles/test-seams-reset-hooks-and-injected-dependencies.md b/reconstruction-guardrails/verification-and-native-test-oracles/test-seams-reset-hooks-and-injected-dependencies.md
index 0b6568b..c0ca7a7 100644
--- a/reconstruction-guardrails/verification-and-native-test-oracles/test-seams-reset-hooks-and-injected-dependencies.md
+++ b/reconstruction-guardrails/verification-and-native-test-oracles/test-seams-reset-hooks-and-injected-dependencies.md
@@ -4,6 +4,7 @@ owners: [bingran-you]
 soft_links:
   - /tools-and-permissions/tool-catalog/tool-families.md
   - /tools-and-permissions/tool-catalog/tool-pool-assembly.md
+  - /tools-and-permissions/permissions/permission-decision-pipeline.md
   - /ui-and-experience/dialogs-and-approvals/permission-prompt-shell-and-worker-states.md
   - /integrations/clients/ssh-remote-session-and-auth-proxy.md
   - /integrations/clients/structured-io-and-headless-session-loop.md
@@ -21,6 +22,7 @@ The current Claude Code build does not rely only on coarse top-down black-box te
 The snapshot shows several recurring seam patterns:
 
 - targeted dependency injection where module-spy boilerplate would otherwise be brittle or cyclic
+- module-boundary indirection that keeps replacement, spying, or late binding viable across cyclic or feature-gated imports
 - helper functions explicitly exported for testing, especially around parsing, serialization, cache placement, and runtime edge behavior
 - reset or clear hooks for stateful services and caches
 - admission-sensitive helper surfaces that only exist under test posture
@@ -36,6 +38,16 @@ These seams are not random internal conveniences. They reveal the kinds of behav
 - parser and serializer edge cases
 - resume- and transcript-sensitive flows
 
+## Import and module-boundary discipline is part of testability
+
+Equivalent behavior should preserve:
+
+- narrow dependency seams where a core flow would otherwise force repetitive per-module spying
+- import structures that keep live bindings or late binding available when cycles, feature gates, or mock replacement would otherwise make tests brittle
+- the ability to replace or observe one collaborator without changing the rest of the production topology
+
+This is an architectural testing rule, not just a style preference. In the visible snapshot, import indirection and targeted DI are both used to keep real modules testable under modern ESM-style mocking constraints.
+
 ## Resettable singleton state is part of the seam contract
 
 The source snapshot repeatedly exposes reset or seed helpers for long-lived runtime state. That is part of the test framework, not merely local cleanup style.
diff --git a/tools-and-permissions/filesystem-and-shell/NODE.md b/tools-and-permissions/filesystem-and-shell/NODE.md
index 5c81ffd..d775744 100644
--- a/tools-and-permissions/filesystem-and-shell/NODE.md
+++ b/tools-and-permissions/filesystem-and-shell/NODE.md
@@ -14,4 +14,4 @@ Relevant leaves:
 - **[shell-execution-and-backgrounding.md](shell-execution-and-backgrounding.md)** — How shell tools stream, background, reuse tasks, and stay responsive in assistant mode.
 - **[shell-command-parsing-and-classifier-flow.md](shell-command-parsing-and-classifier-flow.md)** — How trustworthy shell structure, fallback parsing, compound-command suggestions, and speculative Bash auto-approval interact before execution.
 - **[shell-rule-grammar-and-matching.md](shell-rule-grammar-and-matching.md)** — The shared exact/prefix/wildcard rule grammar and the Bash/PowerShell normalization rules around it.
-- **[sed-command-validation-contracts.md](sed-command-validation-contracts.md)** — Testable contracts for sed command validation (allowlist/denylist patterns), with acceptance criteria for Python reconstruction.
+- **[sed-command-validation-contracts.md](sed-command-validation-contracts.md)** — Acceptance oracle for which `sed` shapes remain low-risk inspection or bounded rewrite requests and which must escalate.
diff --git a/tools-and-permissions/filesystem-and-shell/sed-command-validation-contracts.md b/tools-and-permissions/filesystem-and-shell/sed-command-validation-contracts.md
index aca5338..5498184 100644
--- a/tools-and-permissions/filesystem-and-shell/sed-command-validation-contracts.md
+++ b/tools-and-permissions/filesystem-and-shell/sed-command-validation-contracts.md
@@ -8,208 +8,66 @@ verification_status: native_test_derived
 
 # Sed Command Validation Contracts
 
-This leaf documents the testable contracts for sed command validation, extracted from the Claude Code source `tools/BashTool/sedValidation.ts` (685 lines). These contracts define which sed commands are automatically allowed vs. which require explicit user approval.
+This leaf captures the acceptance oracle for when `sed` can stay on a low-risk inspection or bounded rewrite path and when it must escalate to explicit approval. The clean-room value is the behavioral boundary, not the exact parser helpers or regexes upstream used.
 
 ## Scope boundary
 
 This leaf covers:
 
-- sed command allowlist patterns for automatic approval
-- sed command denylist patterns that trigger approval prompts
-- testable function contracts with input/output specifications
-- extractable test cases for Python reconstruction validation
+- the safe `sed` shapes that can remain on an automatic path
+- the high-risk shapes that must escalate
+- the hardening rules that make tricky or ambiguous programs fail closed
 
 It intentionally does not cover:
 
-- general shell command parsing (see shell-command-parsing-and-classifier-flow.md)
-- permission decision pipeline (see permission-decision-pipeline.md)
+- the general shell parsing stack
+- the broader permission-decision pipeline already covered elsewhere
 
-## Test entry point pattern
+## Read-oriented inspection can stay narrow
 
-The source marks testable functions with `@internal Exported for testing` JSDoc comments. This is the primary pattern for identifying test contracts.
+Equivalent behavior should preserve a narrow inspection path for `sed` commands that are clearly being used to read or print content rather than to mutate or execute anything.
 
-## Testable function contracts
+That safe path should preserve:
 
-### `isPrintCommand(cmd: string): boolean`
+- quiet line-printing programs whose purpose is to reveal selected lines
+- simple addressed print selectors for targeted inspection
+- file arguments when the command is still plainly inspection-only
+- a boundary that stays narrow enough that complex `sed` programs do not accidentally inherit read-only trust
 
-Validates that a sed expression is a safe print command.
+## Simple rewrite-to-stdout can stay distinct from persistence
 
-**Contract**: Returns `true` only for these exact forms:
-- `p` (print all)
-- `Np` (print line N, where N is digits)
-- `N,Mp` (print lines N through M)
+The visible upstream oracle also distinguishes a limited rewrite path from full shell mutation.
 
-**Implementation regex**: `/^(?:\d+|\d+,\d+)?p$/`
+Equivalent behavior should preserve:
 
-**Test cases**:
-```
-PASS: "p"       -> true
-PASS: "1p"      -> true
-PASS: "123p"    -> true
-PASS: "1,5p"    -> true
-PASS: "10,200p" -> true
+- simple substitution-style rewrites whose output still goes to stdout rather than to the filesystem
+- a stricter posture for in-place or file-targeted rewrites than for inspection-only use
+- the ability for a separately authorized file-write posture to allow some rewrite flows without collapsing into "arbitrary `sed` is safe"
 
-FAIL: ""        -> false
-FAIL: "w file"  -> false
-FAIL: "e cmd"   -> false
-FAIL: "1,5w"    -> false
-FAIL: "p;w"     -> false
-FAIL: "1p;2p"   -> false (semicolons not allowed in isPrintCommand itself)
-```
+## Dangerous or ambiguous programs must fail closed
 
-### `isLinePrintingCommand(command: string, expressions: string[]): boolean`
+Equivalent behavior should preserve fail-closed escalation for programs that try to blur the boundary between safe text inspection and arbitrary shell mutation.
 
-Validates Pattern 1: sed commands with `-n` flag and print expressions.
+That escalation should include:
 
-**Contract**:
-- Must have `-n` (or `--quiet`/`--silent`) flag
-- Only allows flags: `-n`, `--quiet`, `--silent`, `-E`, `--regexp-extended`, `-r`, `-z`, `--zero-terminated`, `--posix`
-- All expressions must be valid print commands (allows semicolon-separated)
-- File arguments ARE allowed for this pattern
+- commands that persist output to files or request in-place mutation
+- commands that execute shell payloads or smuggle execution through rewrite flags
+- addressed or compound forms that are harder to reason about safely than the narrow allow path
+- tricky syntax such as non-ASCII lookalikes, multiline bodies, negation, step-addressing, brace-heavy grouping, or delimiter tricks that make the command harder to classify confidently
 
-**Test cases**:
-```
-PASS: "sed -n '1p'"
-PASS: "sed -n '1,5p'"
-PASS: "sed -n '1p;2p;3p'"
-PASS: "sed -nE '1p' file.txt"
+## Why this native oracle matters
 
-FAIL: "sed '1p'"           (missing -n flag)
-FAIL: "sed -n 's/a/b/'"    (substitution, not print)
-FAIL: "sed -n '1w file'"   (write command)
-FAIL: "sed -ni '1p'"       (disallowed -i flag for pattern 1)
-```
-
-### `hasFileArgs(command: string): boolean`
-
-Detects if a sed command has file arguments (not just stdin).
-
-**Contract**:
-- Returns `true` if any non-flag, non-expression arguments exist
-- Handles `-e`/`--expression` flags correctly
-- Treats glob patterns as file arguments
-- Returns `true` on parse failure (fail-closed)
-
-**Test cases**:
-```
-PASS (has files): "sed 's/a/b/' file.txt"       -> true
-PASS (has files): "sed -e 's/a/b/' file.txt"    -> true
-PASS (has files): "sed 's/a/b/' *.log"          -> true
-
-FAIL (no files):  "sed 's/a/b/'"                -> false
-FAIL (no files):  "sed -e 's/a/b/'"             -> false
-```
-
-### `extractSedExpressions(command: string): string[]`
-
-Extracts sed expressions from command for validation.
-
-**Contract**:
-- Returns array of sed expressions (content inside quotes)
-- Handles `-e`/`--expression` flags
-- Throws on dangerous flag combinations (`-ew`, `-eW`, `-ee`, `-we`)
-- Throws on malformed shell syntax
-
-**Test cases**:
-```
-PASS: "sed 's/a/b/'"                  -> ["s/a/b/"]
-PASS: "sed -e 's/a/b/' -e 's/c/d/'"   -> ["s/a/b/", "s/c/d/"]
-PASS: "sed --expression='1p'"          -> ["1p"]
-
-THROW: "sed -ew 's/a/b/'"             (dangerous flag combo)
-THROW: "sed 's/a/b"                   (malformed syntax)
-```
-
-### `sedCommandIsAllowedByAllowlist(command: string, options?: { allowFileWrites?: boolean }): boolean`
-
-Main entry point for sed validation.
-
-**Contract**:
-- Pattern 1 (line printing): `-n` flag + print commands, file args allowed
-- Pattern 2 (substitution): `s/pattern/replacement/flags`, stdout only by default
-- With `allowFileWrites: true`: Pattern 2 allows `-i` flag and file arguments
-- Defense-in-depth: Even if allowlist matches, denylist is still checked
-- Pattern 2 does not allow semicolons in expressions
-
-**Allowed substitution flags**: `g`, `p`, `i`, `I`, `m`, `M`, `1-9`
-
-**Test cases**:
-```
-# Pattern 1 - line printing
-PASS: "sed -n '1p'"
-PASS: "sed -n '1,5p' file.txt"
-
-# Pattern 2 - substitution (stdout only)
-PASS: "sed 's/foo/bar/'"
-PASS: "sed 's/foo/bar/g'"
-PASS: "sed -E 's/foo/bar/gi'"
-
-FAIL: "sed 's/foo/bar/' file.txt"       (file args without allowFileWrites)
-FAIL: "sed -i 's/foo/bar/' file.txt"    (in-place without allowFileWrites)
-
-# Pattern 2 with allowFileWrites: true
-PASS: "sed -i 's/foo/bar/' file.txt"    (in-place editing allowed)
-```
-
-## Denylist patterns (`containsDangerousOperations`)
-
-The denylist provides defense-in-depth by blocking dangerous patterns even if the allowlist matched.
-
-### Blocked patterns
-
-| Pattern | Examples | Reason |
-|---------|----------|--------|
-| Non-ASCII | `ｗ` (fullwidth), `ᴡ` (small capital) | Unicode homoglyphs |
-| Curly braces | `{cmd}`, `{1,5}` | Block constructs too complex |
-| Newlines | `\n` in expression | Multi-line commands |
-| Write commands | `w file`, `W file`, `/pattern/w file` | File writes |
-| Execute commands | `e`, `1e`, `/pattern/e` | Shell execution |
-| Substitution write flag | `s/old/new/w file` | Write via s flag |
-| Substitution execute flag | `s/old/new/e` | Execute via s flag |
-| Negation | `!/pattern/`, `/pattern/!` | Negation operator |
-| Step address | `1~2`, `$~3` | GNU step syntax |
-| Backslash delimiter | `s\pattern\repl\` | Alternate delimiter tricks |
-| y command with w/W/e/E | `y/a/b/w` | Paranoid block |
-
-**Test cases**:
-```
-BLOCK: "{}"                    (curly braces)
-BLOCK: "\n"                    (newline)
-BLOCK: "w output.txt"          (write command)
-BLOCK: "e /bin/sh"             (execute command)
-BLOCK: "1w file"               (addressed write)
-BLOCK: "s/old/new/w file"      (substitution write flag)
-BLOCK: "s/old/new/e"           (substitution execute flag)
-BLOCK: "/pattern/w file"       (pattern-addressed write)
-BLOCK: "!/pattern/p"           (negation)
-BLOCK: "1~2p"                  (step address)
-```
-
-## Reconstruction guidance
-
-A Python reconstruction should:
-
-1. Implement `is_print_command(cmd)` with regex `/^(?:\d+|\d+,\d+)?p$/`
-2. Implement `is_line_printing_command(command, expressions)` checking:
-   - `-n` flag presence
-   - Flag allowlist validation
-   - All expressions are valid print commands
-3. Implement `has_file_args(command)` with shell parsing
-4. Implement `extract_sed_expressions(command)` with:
-   - `-e`/`--expression` handling
-   - Dangerous flag combo detection
-   - Error on malformed syntax
-5. Implement `sed_command_is_allowed_by_allowlist(command, allow_file_writes)` combining:
-   - Pattern 1 and Pattern 2 checks
-   - Denylist defense-in-depth
-6. Implement `contains_dangerous_operations(expression)` with all denylist patterns
-
-## Acceptance criteria
-
-- [ ] All `isPrintCommand` test cases pass
-- [ ] All `isLinePrintingCommand` test cases pass
-- [ ] All `hasFileArgs` test cases pass
-- [ ] All `extractSedExpressions` test cases pass (including throws)
-- [ ] All `sedCommandIsAllowedByAllowlist` test cases pass
-- [ ] All denylist patterns correctly blocked
-- [ ] Defense-in-depth: denylist runs even when allowlist matches
+The important upstream signal is not that `sed` has one parser helper or another. It is that Claude Code treats `sed` as a special case where:
+
+- some clearly inspection-oriented usage is cheap enough to auto-allow
+- some clearly bounded rewrite usage can stay narrower than full shell permission
+- everything ambiguous or persistence-capable should fail closed
+
+Without this oracle, a rebuild will usually be either too permissive or too annoying.
+
+## Failure modes
+
+- **inspection collapse**: harmless line-printing commands no longer fit through the low-risk path
+- **rewrite overgrant**: in-place or file-writing programs inherit the same trust as stdout-only inspection
+- **syntax blind spot**: tricky `sed` forms sneak past because the allow path only checks the happy case
+- **parser overfitting**: the rebuild copies one implementation's helper functions but misses the behavioral safety boundary they were defending
diff --git a/tools-and-permissions/permissions/NODE.md b/tools-and-permissions/permissions/NODE.md
index 6f2ef95..744c878 100644
--- a/tools-and-permissions/permissions/NODE.md
+++ b/tools-and-permissions/permissions/NODE.md
@@ -16,5 +16,5 @@ Relevant leaves:
 - **[permission-resolution-races-and-forwarding.md](permission-resolution-races-and-forwarding.md)** — Single-winner ask-resolution races across dialog, bridge, mailbox, channel relay, hooks, classifier, and abort paths.
 - **[sandbox-selection-and-bypass-guards.md](sandbox-selection-and-bypass-guards.md)** — How sandbox selection, excluded commands, policy-gated overrides, and Windows refusal paths interact.
 - **[config-permission-and-sandbox-admin-surfaces.md](config-permission-and-sandbox-admin-surfaces.md)** — Registry-backed config mutation on eligible builds, plus permission browser, denied-command retry, and sandbox admin surfaces.
-- **[yolo-classifier-contracts.md](yolo-classifier-contracts.md)** — Testable contracts for the YOLO (auto mode) classifier, including transcript building, XML parsing, and two-stage classifier behavior.
-- **[e2e-permission-testing-contracts.md](e2e-permission-testing-contracts.md)** — Testable contracts for E2E permission testing using TestingPermissionTool, including test scenarios and environment gating patterns.
+- **[yolo-classifier-contracts.md](yolo-classifier-contracts.md)** — Acceptance oracles for automatic-approval transcript shaping, fail-safe classifier verdicts, staged review, and environment-specific deny overlays.
+- **[e2e-permission-testing-contracts.md](e2e-permission-testing-contracts.md)** — How a test-only approval probe reuses the normal permission dialog path without widening the public tool surface.
diff --git a/tools-and-permissions/permissions/e2e-permission-testing-contracts.md b/tools-and-permissions/permissions/e2e-permission-testing-contracts.md
index 6839796..16acc62 100644
--- a/tools-and-permissions/permissions/e2e-permission-testing-contracts.md
+++ b/tools-and-permissions/permissions/e2e-permission-testing-contracts.md
@@ -1,210 +1,73 @@
 ---
 title: "E2E Permission Testing Contracts"
 owners: [bingran-you]
-soft_links: [/tools-and-permissions/permissions/permission-decision-pipeline.md, /tools-and-permissions/permissions/yolo-classifier-contracts.md]
+soft_links: [/tools-and-permissions/permissions/permission-decision-pipeline.md, /ui-and-experience/dialogs-and-approvals/permission-prompt-shell-and-worker-states.md, /reconstruction-guardrails/verification-and-native-test-oracles/test-runtime-mode-and-determinism.md, /reconstruction-guardrails/verification-and-native-test-oracles/e2e-harness-reality-boundaries.md]
 native_source: tools/testing/TestingPermissionTool.tsx
 verification_status: native_test_derived
 ---
 
 # E2E Permission Testing Contracts
 
-This leaf documents the testable contracts for end-to-end permission testing, extracted from Claude Code source `tools/testing/TestingPermissionTool.tsx` (73 lines). This tool provides a harness for testing the permission dialog flow.
+This leaf captures the verification contract for a test-only approval harness. The clean-room value is not one exact tool implementation. It is the existence of a narrow probe that can reliably force the ordinary permission flow to happen on demand without turning that probe into a public product feature.
 
 ## Scope boundary
 
 This leaf covers:
 
-- TestingPermissionTool behavior contracts
-- E2E permission flow testing patterns
-- Test environment gating
+- how a test-only approval probe is admitted
+- what that probe must prove about the ordinary permission path
+- what properties keep the harness deterministic and low-risk
 
 It intentionally does not cover:
 
-- YOLO classifier internals (see yolo-classifier-contracts.md)
-- Permission decision pipeline (see permission-decision-pipeline.md)
+- the general permission-decision pipeline
+- automatic-approval classifier behavior
+- every specialized approval surface already documented in the UI domain
 
-## TestingPermissionTool contract
+## The approval probe is a harness, not a product feature
 
-A testing-only tool that **always triggers a permission dialog** when called by the model.
+Equivalent behavior should preserve:
 
-### Core properties
+- admission only in test posture
+- normal runtime registration through the ordinary tool pool rather than through a fake backdoor
+- exclusion from ordinary production-facing tool surfaces
+- a low-risk, read-only execution shape once the approval is granted
 
-| Property | Value | Purpose |
-|----------|-------|---------|
-| `name` | `'TestingPermission'` | Tool identifier |
-| `maxResultSizeChars` | `100_000` | Max result size |
-| `isEnabled()` | `process.env.NODE_ENV === 'test'` | Only enabled in test environment |
-| `isConcurrencySafe()` | `true` | Safe for concurrent execution |
-| `isReadOnly()` | `true` | Does not modify state |
+The important contract is that the harness is narrow enough to be safe, but real enough to exercise the same approval path users see.
 
-### Input schema
+## The harness must force the ordinary ask path
 
-```typescript
-z.strictObject({})  // Empty object, no parameters
-```
+Equivalent behavior should preserve:
 
-### Permission behavior
+- a test-only action that contributes an explicit approval request instead of auto-allowing through the normal safe-action shortcuts
+- grant, deny, cancel, and queue-advance behavior all flowing through the same permission shell and callback machinery as ordinary approvals
+- the same approval route being usable for foreground sessions and forwarded-worker or delegated approval cases when those flows are under test
 
-```typescript
-checkPermissions(): PermissionResult {
-  return {
-    behavior: 'ask',
-    message: 'Run test?'
-  }
-}
-```
+An approval harness that manipulates dialog state directly would miss the contract this leaf is meant to defend.
 
-**Contract**: Always returns `{ behavior: 'ask' }` regardless of:
-- Permission mode (auto, plan, normal)
-- User settings
-- Any other context
+## Post-approval behavior should stay deterministic and low-noise
 
-This makes it ideal for testing the permission dialog flow.
+Equivalent behavior should preserve:
 
-### Execution behavior
+- a deterministic success path after the user approves the harness action
+- minimal or no external side effects beyond the permission event itself
+- concurrency safety appropriate for automated suites that may run many permission cases in one process
 
-```typescript
-call(): { data: string } {
-  return { data: 'TestingPermission executed successfully' }
-}
-```
+The purpose of the harness is to validate the approval machinery, not to add unrelated filesystem or network variables.
 
-**Contract**: On successful execution (after permission granted), returns success message.
+## What the e2e lane should prove
 
-## Test cases
+Equivalent end-to-end coverage should be able to prove at least:
 
-### isEnabled contract
+- the approval prompt actually appears when the harness is invoked
+- user grant allows the action to finish through the normal continuation path
+- user denial blocks execution and clears the queue correctly
+- cancellation and queue-head transitions do not leave stale waiting state behind
+- worker-forwarded or remote-capable approval surfaces, when exercised, still reuse the same underlying approval semantics
 
-```
-TEST: Tool only available in test environment
-CONDITION: NODE_ENV === 'test'
-EXPECTED: isEnabled() returns true
+## Failure modes
 
-CONDITION: NODE_ENV === 'production'
-EXPECTED: isEnabled() returns false
-
-CONDITION: NODE_ENV === 'development'
-EXPECTED: isEnabled() returns false
-```
-
-### checkPermissions contract
-
-```
-TEST: Always asks for permission regardless of mode
-CONTEXT: Any permission mode (auto, plan, normal)
-EXPECTED: { behavior: 'ask', message: 'Run test?' }
-
-TEST: Permission message is consistent
-CALL: checkPermissions()
-EXPECTED: message === 'Run test?'
-```
-
-### call contract
-
-```
-TEST: Successful execution returns expected message
-PRECONDITION: Permission was granted
-CALL: call()
-EXPECTED: { data: 'TestingPermission executed successfully' }
-```
-
-## Testing patterns derived from this tool
-
-### Pattern 1: Environment-gated test tools
-
-Tools that should only be available during testing should implement:
-
-```typescript
-isEnabled() {
-  return process.env.NODE_ENV === 'test'
-}
-```
-
-This ensures the tool is:
-- Available during automated tests
-- Hidden from production users
-- Excluded from tool catalogs in production builds
-
-### Pattern 2: Unconditional permission triggers
-
-For testing permission flows, tools should bypass all permission caching and rules:
-
-```typescript
-checkPermissions() {
-  return { behavior: 'ask', message: '...' }  // Always 'ask', never 'allow' or 'deny'
-}
-```
-
-### Pattern 3: Minimal side effects
-
-Test tools should be:
-- `isConcurrencySafe: true` - Safe to run in parallel
-- `isReadOnly: true` - No filesystem/state mutations
-- Deterministic output - Same input always produces same output
-
-## E2E permission flow test scenarios
-
-Using TestingPermissionTool, the following E2E scenarios can be verified:
-
-### Scenario 1: Permission dialog appearance
-
-```
-1. Model calls TestingPermission tool
-2. VERIFY: Permission dialog appears with "Run test?" message
-3. VERIFY: Dialog shows correct tool name
-```
-
-### Scenario 2: Permission grant flow
-
-```
-1. Model calls TestingPermission tool
-2. User grants permission
-3. VERIFY: Tool executes successfully
-4. VERIFY: Result contains "executed successfully"
-```
-
-### Scenario 3: Permission deny flow
-
-```
-1. Model calls TestingPermission tool
-2. User denies permission
-3. VERIFY: Tool does not execute
-4. VERIFY: Appropriate denial message returned to model
-```
-
-### Scenario 4: Permission dialog state management
-
-```
-1. Model calls TestingPermission tool
-2. VERIFY: Dialog enters pending state
-3. User responds
-4. VERIFY: Dialog transitions to resolved state
-5. VERIFY: State cleanup occurs
-```
-
-## Reconstruction guidance
-
-A Python reconstruction should:
-
-1. Implement `TestingPermissionTool` with:
-   - Environment check in `is_enabled()`
-   - Always-ask behavior in `check_permissions()`
-   - Deterministic success response in `call()`
-
-2. Use it to test:
-   - Permission dialog rendering
-   - User interaction handling
-   - Grant/deny flow completion
-   - State machine transitions
-
-3. Ensure tool is excluded from production builds or gated by environment
-
-## Acceptance criteria
-
-- [ ] `isEnabled()` returns `True` only when `NODE_ENV === 'test'`
-- [ ] `checkPermissions()` always returns `{ behavior: 'ask' }`
-- [ ] `call()` returns `{ data: 'TestingPermission executed successfully' }`
-- [ ] Tool is not visible in production tool catalog
-- [ ] Permission dialog appears when tool is called
-- [ ] Grant flow completes successfully
-- [ ] Deny flow blocks execution
+- **public harness leak**: the approval probe becomes visible outside test posture
+- **fake dialog coverage**: tests bypass the normal permission shell and therefore stop proving real user behavior
+- **side-effect pollution**: the harness action itself mutates unrelated runtime state and makes permission tests flaky
+- **queue drift**: grant, deny, or cancel paths leave stale approval rows or waiting state behind
diff --git a/tools-and-permissions/permissions/yolo-classifier-contracts.md b/tools-and-permissions/permissions/yolo-classifier-contracts.md
index 48cf209..59bc351 100644
--- a/tools-and-permissions/permissions/yolo-classifier-contracts.md
+++ b/tools-and-permissions/permissions/yolo-classifier-contracts.md
@@ -8,264 +8,79 @@ verification_status: native_test_derived
 
 # YOLO Classifier Contracts
 
-This leaf documents the testable contracts for the YOLO (auto mode) classifier, extracted from Claude Code source `utils/permissions/yoloClassifier.ts` (1496 lines). The YOLO classifier is an ML-based system that decides whether agent actions should be auto-approved or require user confirmation.
+This leaf captures the acceptance oracle for Claude Code's automatic-approval classifier. The clean-room value is not the exact schema tags or parser helpers upstream used. It is the shape of the classifier input, the fail-safe verdict rules, and the extra deny overlays that keep auto mode from widening into silent over-permission.
 
 ## Scope boundary
 
 This leaf covers:
 
-- Transcript building and serialization contracts
-- Classifier response parsing contracts
-- Auto mode rules configuration contracts
-- Two-stage XML classifier behavior contracts
+- how action history is shaped before classification
+- how verdict extraction stays machine-readable and fail-safe
+- how fast and escalated classifier passes relate to each other
+- how environment-specific deny guidance extends the generic auto-mode policy
 
 It intentionally does not cover:
 
-- General permission decision pipeline (see permission-decision-pipeline.md)
-- Permission model architecture (see permission-model.md)
-- Bash-specific classifier rules (separate module)
+- the full permission-decision pipeline
+- the broader permission posture model
+- shell-specific parser contracts already documented elsewhere
 
-## Core types
+## Transcript shaping must resist classifier manipulation
 
-### `AutoModeRules`
+Equivalent behavior should preserve:
 
-Configuration schema for customizable classifier rules.
+- user requests and assistant tool intents being the important classifier context
+- assistant narration or persuasive self-explanation not being treated as authoritative classifier input
+- queued or deferred actions still being representable in the classifier transcript when they are the real thing about to execute
+- tool-owned compact action serialization being preferred when available, so the classifier sees a stable intent summary instead of arbitrary UI text
 
-```typescript
-type AutoModeRules = {
-  allow: string[]      // Actions to auto-approve
-  soft_deny: string[]  // Actions to block (soft, can be overridden)
-  environment: string[] // Environment context for classifier
-}
-```
+This transcript-shaping rule is one of the main guardrails against the model talking itself into permission.
 
-### `TranscriptEntry`
+## Verdict extraction must be machine-readable and fail-safe
 
-Format for conversation history entries fed to classifier.
+Equivalent behavior should preserve:
 
-```typescript
-type TranscriptEntry = {
-  role: 'user' | 'assistant'
-  content: TranscriptBlock[]
-}
+- a machine-readable allow or block verdict, not only free-form rationale
+- the ability to discard non-binding reasoning when extracting the actual decision
+- a brief reason channel that can explain why a decision blocked or escalated
+- safety-first behavior when parsing, schema validation, transport, or inference fails
 
-type TranscriptBlock =
-  | { type: 'text'; text: string }
-  | { type: 'tool_use'; name: string; input: unknown }
-```
+The important product contract is that failure never silently turns into approval.
 
-### `YoloClassifierResult`
+## Fast pass and escalated pass can coexist
 
-Response schema from classifier.
+The visible snapshot suggests a staged auto-approval design rather than one monolithic classifier pass.
 
-```typescript
-// Tool-use classifier response (via classify_result tool)
-{
-  thinking: string      // Brief step-by-step reasoning
-  shouldBlock: boolean  // true = block, false = allow
-  reason: string        // Brief explanation
-}
-```
+Equivalent behavior should preserve:
 
-## Testable function contracts
+- a cheap first-pass classifier that can allow obviously safe actions quickly
+- escalation to a richer second pass when the fast answer is "block", ambiguous, or unparsable
+- the possibility of leaving the auto lane entirely and returning to manual review when the classifier transcript is too large or otherwise unsuitable for safe automation
 
-### `buildTranscriptEntries(messages: Message[]): TranscriptEntry[]`
+The exact envelope syntax is implementation detail. The architectural rule is that ambiguous cases get more scrutiny, not more trust.
 
-Builds transcript from message history for classifier input.
+## Environment-specific deny overlays still matter
 
-**Contract**:
-- Includes user text messages as `{ role: 'user', content: [{ type: 'text', text }] }`
-- Includes assistant tool_use blocks (excluding assistant text to prevent classifier manipulation)
-- Extracts queued_command prompts from attachment messages
-- Returns empty content array entries are filtered out
+Equivalent behavior should preserve:
 
-**Test cases**:
-```
-INPUT: [{ type: 'user', message: { content: 'Hello' } }]
-OUTPUT: [{ role: 'user', content: [{ type: 'text', text: 'Hello' }] }]
+- the ability to append extra deny guidance for environments whose risk profile differs from generic shell auto mode
+- specialized treatment for surfaces such as PowerShell where code loading, persistence, execution-policy weakening, or network-backed execution carry distinct risk categories
+- those deny overlays remaining an extension of the same classifier system rather than a separate permission engine with unrelated semantics
 
-INPUT: [{ type: 'assistant', message: { content: [
-  { type: 'text', text: 'I will run a command' },  // EXCLUDED
-  { type: 'tool_use', name: 'Bash', input: { command: 'ls' } }
-] } }]
-OUTPUT: [{ role: 'assistant', content: [{ type: 'tool_use', name: 'Bash', input: { command: 'ls' } }] }]
-```
+## Reconstruction rule
 
-### `buildTranscriptForClassifier(messages: Message[], tools: Tools): string`
+A faithful rebuild should preserve:
 
-Serializes transcript to compact string format for classifier.
+- transcript shaping that prioritizes user intent and tool intent over assistant self-justification
+- a structured, machine-readable decision channel with explicit allow or block semantics
+- fail-safe behavior on parser, transport, or inference failure
+- a staged path where fast answers are allowed to escalate rather than forced to decide every ambiguous case
+- environment-specific deny overlays for shells or runtimes whose risk classes differ from the default
 
-**Contract**:
-- Uses tool's `toAutoClassifierInput()` method for encoding
-- JSONL format when enabled: `{"Bash":"ls"}\n`
-- Legacy format: `Bash ls\n`
-- User messages: `{"user":"text"}\n` or `User: text\n`
-- Empty tool encodings (`''`) are skipped
+## Failure modes
 
-**Test cases**:
-```
-# JSONL format
-TOOL_USE(Bash, {command: 'ls'}) -> '{"Bash":"ls"}\n'
-USER_TEXT('hello') -> '{"user":"hello"}\n'
-
-# Legacy format
-TOOL_USE(Bash, {command: 'ls'}) -> 'Bash ls\n'
-USER_TEXT('hello') -> 'User: hello\n'
-```
-
-### `getDefaultExternalAutoModeRules(): AutoModeRules`
-
-Parses default rules from external permissions template.
-
-**Contract**:
-- Extracts bullet items from `<user_allow_rules_to_replace>` tag
-- Extracts bullet items from `<user_deny_rules_to_replace>` tag
-- Extracts bullet items from `<user_environment_to_replace>` tag
-- Each line starting with `- ` becomes one array entry (minus the prefix)
-
-**Test cases**:
-```
-TEMPLATE: "<user_allow_rules_to_replace>\n- Rule 1\n- Rule 2\n</user_allow_rules_to_replace>"
-OUTPUT: { allow: ['Rule 1', 'Rule 2'], soft_deny: [], environment: [] }
-```
-
-### `formatActionForClassifier(toolName: string, toolInput: unknown): TranscriptEntry`
-
-Creates a transcript entry for a single action being classified.
-
-**Contract**:
-- Returns `{ role: 'assistant', content: [{ type: 'tool_use', name: toolName, input: toolInput }] }`
-- Used to format the current action being evaluated
-
-**Test cases**:
-```
-INPUT: ('Bash', { command: 'rm -rf /' })
-OUTPUT: { role: 'assistant', content: [{ type: 'tool_use', name: 'Bash', input: { command: 'rm -rf /' } }] }
-```
-
-## XML classifier contracts
-
-The two-stage XML classifier is an alternative to the tool-use classifier.
-
-### XML response parsing
-
-**`parseXmlBlock(text: string): boolean | null`**
-
-Parses `<block>yes/no</block>` from classifier response.
-
-**Contract**:
-- Strips `<thinking>...</thinking>` content first (prevent manipulation)
-- Returns `true` for "yes" (should block)
-- Returns `false` for "no" (should allow)
-- Returns `null` if unparseable
-- Case-insensitive matching
-
-**Test cases**:
-```
-PASS: "<block>yes</block>" -> true
-PASS: "<block>no</block>" -> false
-PASS: "<block>YES</block>" -> true
-PASS: "<thinking>ignore</thinking><block>no</block>" -> false
-FAIL: "<thinking><block>yes</block></thinking><block>no</block>" -> false (inner ignored)
-FAIL: "no block tag" -> null
-```
-
-**`parseXmlReason(text: string): string | null`**
-
-Extracts `<reason>...</reason>` from classifier response.
-
-**Contract**:
-- Strips `<thinking>...</thinking>` content first
-- Returns trimmed reason text
-- Returns `null` if no reason tag
-
-**Test cases**:
-```
-PASS: "<reason>Security risk</reason>" -> "Security risk"
-PASS: "<reason>  Trimmed  </reason>" -> "Trimmed"
-FAIL: "no reason" -> null
-```
-
-### Two-stage classifier modes
-
-**Modes**:
-- `'both'` (default): Stage 1 fast decision, escalate to Stage 2 on block
-- `'fast'`: Stage 1 only, verdict is final
-- `'thinking'`: Stage 2 only, includes chain-of-thought
-
-**Stage 1 (fast)**:
-- max_tokens: 64 (256 in fast-only mode)
-- Suffix: "Err on the side of blocking. <block> immediately."
-- If `<block>no</block>` → allow immediately (fast path)
-- If `<block>yes</block>` or null → escalate to Stage 2 (or block in fast-only)
-
-**Stage 2 (thinking)**:
-- max_tokens: 4096
-- Suffix: "Review the classification process and follow it carefully..."
-- Includes `<thinking>` before `<block>`
-
-## Classifier response schema
-
-### Tool-use classifier
-
-Uses `classify_result` tool with schema:
-
-```json
-{
-  "type": "object",
-  "properties": {
-    "thinking": { "type": "string", "description": "Brief step-by-step reasoning" },
-    "shouldBlock": { "type": "boolean", "description": "Whether to block (true) or allow (false)" },
-    "reason": { "type": "string", "description": "Brief explanation" }
-  },
-  "required": ["thinking", "shouldBlock", "reason"]
-}
-```
-
-### Error handling
-
-**Contract**:
-- On parse failure → `shouldBlock: true` with reason "blocking for safety"
-- On API error → `shouldBlock: true` with `unavailable: true`
-- On abort signal → `shouldBlock: true` with `unavailable: true`
-- On prompt too long → `shouldBlock: true` with `transcriptTooLong: true`
-
-## PowerShell deny guidance
-
-When PowerShell auto mode is active, additional deny rules are appended:
-
-| Pattern | Maps to Category |
-|---------|------------------|
-| `iex (iwr ...)`, `Invoke-Expression (Invoke-WebRequest ...)` | Code from External |
-| `Remove-Item -Recurse -Force`, `rm -r -fo` | Irreversible Local Destruction |
-| Modifying `$PROFILE`, `Register-ScheduledTask` | Unauthorized Persistence |
-| `Start-Process -Verb RunAs`, `-ExecutionPolicy Bypass` | Security Weaken |
-
-## Reconstruction guidance
-
-A Python reconstruction should:
-
-1. Implement `TranscriptEntry` and `TranscriptBlock` types
-2. Implement `build_transcript_entries(messages)` extracting:
-   - User text messages (excluding images)
-   - Assistant tool_use blocks (excluding assistant text)
-   - Queued command prompts from attachments
-3. Implement `build_transcript_for_classifier(messages, tools)` with:
-   - Tool `to_auto_classifier_input()` method support
-   - JSONL and legacy format output
-4. Implement XML response parsers:
-   - `parse_xml_block(text)` with thinking-stripping
-   - `parse_xml_reason(text)` with thinking-stripping
-5. Implement classifier result schema validation
-6. Implement fail-safe error handling (block on any failure)
-
-## Acceptance criteria
-
-- [ ] TranscriptEntry correctly excludes assistant text content
-- [ ] buildTranscriptEntries handles queued_command attachments
-- [ ] buildTranscriptForClassifier uses tool's toAutoClassifierInput
-- [ ] parseXmlBlock strips thinking content before parsing
-- [ ] parseXmlReason strips thinking content before parsing
-- [ ] Classifier returns shouldBlock: true on any error
-- [ ] Two-stage classifier escalates correctly from Stage 1 to Stage 2
-- [ ] PowerShell deny guidance maps to correct categories
+- **self-persuasion leak**: assistant narration becomes classifier input and lets the model argue itself into approval
+- **free-form verdict drift**: the runtime relies on prose parsing instead of a stable allow-or-block channel
+- **fail-open classifier**: parser or transport failure silently becomes approval
+- **single-pass overconfidence**: ambiguous or likely-block actions never escalate to the richer review path
+- **overlay loss**: PowerShell or other high-risk environments lose their extra deny guidance and inherit an unsafe generic auto-mode policy