Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
title: "SSH Remote Session and Auth Proxy"
owners: []
soft_links: [/integrations/clients/direct-connect-session-bootstrap-and-environment-selection.md, /integrations/clients/remote-session-message-adaptation-and-viewer-state.md, /product-surface/startup-entrypoint-routing-and-session-handoff.md, /platform-services/provider-specific-api-clients-and-auth-routing.md, /reconstruction-guardrails/verification-and-native-test-oracles/test-seams-reset-hooks-and-injected-dependencies.md]
soft_links: [/integrations/clients/direct-connect-session-bootstrap-and-environment-selection.md, /integrations/clients/remote-session-message-adaptation-and-viewer-state.md, /product-surface/startup-entrypoint-routing-and-session-handoff.md, /platform-services/provider-specific-api-clients-and-auth-routing.md, /reconstruction-guardrails/verification-and-native-test-oracles/test-seams-reset-hooks-and-injected-dependencies.md, /reconstruction-guardrails/verification-and-native-test-oracles/e2e-harness-reality-boundaries.md]
---

# SSH Remote Session and Auth Proxy
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ soft_links:
- /integrations/mcp/server-contract.md
- /platform-services/auth-config-and-policy.md
- /reconstruction-guardrails/verification-and-native-test-oracles/test-seams-reset-hooks-and-injected-dependencies.md
- /reconstruction-guardrails/verification-and-native-test-oracles/e2e-harness-reality-boundaries.md
---

# Federated Auth Conformance and IdP Test Seeding
Expand Down
310 changes: 43 additions & 267 deletions platform-services/mock-rate-limit-scenarios-and-test-contracts.md

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,10 @@ This subdomain captures cross-cutting knowledge about how the observed Claude Co
Relevant leaves:

- **[test-framework-overview.md](test-framework-overview.md)** — The layered shape of the current test system, including the visible tier model and the boundary between confirmed and inferred runner details.
- **[test-runtime-mode-and-determinism.md](test-runtime-mode-and-determinism.md)** — How `NODE_ENV=test` behaves as a supported runtime posture, including in-memory config behavior, reduced side effects, and deterministic test-only branches.
- **[test-environment-fixtures-and-ci-fail-closed-policy.md](test-environment-fixtures-and-ci-fail-closed-policy.md)** — How test posture suppresses side effects, how fixture replay works, and why missing recordings fail closed in CI.
- **[test-lane-coverage-map.md](test-lane-coverage-map.md)** — Which subsystem contracts are guarded by fast regression, integration, end-to-end, conformance, and compatibility lanes, without overclaiming the hidden runner layout.
- **[e2e-harness-reality-boundaries.md](e2e-harness-reality-boundaries.md)** — Which end-to-end harnesses may shorten setup but still need to preserve real permission, transport, auth-proxy, and credential-cache paths.
- **[test-seams-reset-hooks-and-injected-dependencies.md](test-seams-reset-hooks-and-injected-dependencies.md)** — The narrow seams the product uses to keep hard behaviors testable without turning the whole runtime into a debug harness.
- **[native-test-derived-asset-provenance-and-acceptance-rules.md](native-test-derived-asset-provenance-and-acceptance-rules.md)** — How native test knowledge should be normalized into clean-room contract assets and how those assets should be linked back to their owning domains.
- **[evidence-levels-and-missing-artifacts.md](evidence-levels-and-missing-artifacts.md)** — What this source snapshot proves, what it only strongly suggests, and which missing artifacts still block exact runner-level reproduction.
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
---
title: "E2E Harness Reality Boundaries"
owners: [bingran-you]
soft_links:
- /reconstruction-guardrails/verification-and-native-test-oracles/test-lane-coverage-map.md
- /reconstruction-guardrails/verification-and-native-test-oracles/test-seams-reset-hooks-and-injected-dependencies.md
- /tools-and-permissions/permissions/e2e-permission-testing-contracts.md
- /ui-and-experience/dialogs-and-approvals/permission-prompt-shell-and-worker-states.md
- /integrations/clients/ssh-remote-session-and-auth-proxy.md
- /integrations/mcp/federated-auth-conformance-and-idp-test-seeding.md
- /runtime-orchestration/sessions/session-artifacts-and-sharing.md
---

# E2E Harness Reality Boundaries

The observed Claude Code snapshot uses several harnesses that shorten setup cost without abandoning the real runtime path. That distinction matters: a clean-room rebuild should preserve which parts of end-to-end verification are allowed to be synthetic and which parts still need to exercise production-like orchestration.

## Approval harnesses must still use the real permission path

Equivalent behavior should preserve:

- a narrow approval-oriented harness that can force the permission flow to appear on demand
- that harness still entering through the normal tool catalog, permission-decision pipeline, and permission-prompt shell
- grant, deny, cancel, queue-advance, and worker-forwarding behavior being validated through the same UI and callback machinery users actually see

An e2e approval test that injects dialog state directly is no longer testing the product contract that matters.

## Remote transport harnesses may skip deployment, not orchestration

Equivalent behavior should preserve:

- a local harness mode that can avoid real SSH deployment when the test only needs to verify split local-UI and remote-execution plumbing
- the auth proxy, transcript adaptation, permission relay, and session-lifecycle machinery still being exercised
- failures and reconnect behavior still traveling through the real remote-session contract rather than a fake one-shot shell wrapper

The shortcut is allowed to reduce environment setup. It is not allowed to erase the transport boundary being tested.

## Federated-auth harnesses may skip browser setup, not credential semantics

Equivalent behavior should preserve:

- a deterministic way to seed federated credentials when a mock identity provider does not expose the full interactive browser surface
- seeded credentials landing in the same secure cache slot the ordinary login and refresh paths later read
- downstream exchange, refresh, and revocation behavior still using the normal federated auth path

Otherwise the test stops proving interoperability and starts proving only that a bypass slot was written successfully.

## Session-state harnesses may seed artifacts, not invent a separate resume model

Equivalent behavior should preserve:

- targeted setters or seed helpers for session artifact state when bootstrapping a full prior session would be too expensive
- those helpers still feeding the same transcript, artifact, and resume semantics production uses
- test convenience never becoming a second, incompatible persistence model

## Failure modes

- **fake dialog coverage**: permission tests manipulate UI state directly and stop covering the real approval pipeline
- **transport collapse**: a local remote-session harness stops exercising proxying, relay, or transcript adaptation
- **credential bypass**: federated auth tests seed a token into a cache path the real login flow never reads
- **shadow persistence**: test setters create a second resume model unrelated to the live session artifact system
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ soft_links:
- /reconstruction-guardrails/source-boundary.md
- /reconstruction-guardrails/knowledge-lifecycle.md
- /reconstruction-guardrails/verification-and-native-test-oracles/test-framework-overview.md
- /reconstruction-guardrails/verification-and-native-test-oracles/test-lane-coverage-map.md
---

# Evidence Levels and Missing Artifacts
Expand All @@ -15,16 +16,19 @@ This repository should distinguish between what the current source snapshot prov

The snapshot is sufficient to confirm all of these:

- there are distinct unit or regression, integration, end-to-end, conformance, and compatibility lanes
- `NODE_ENV=test` is a real runtime posture
- `NODE_ENV=test` is a supported runtime posture rather than a one-off conditional
- fixture and VCR replay are first-class testing mechanisms
- there are direct signals for multiple lane families, including at least one compatibility lane, at least one integration lane, dedicated end-to-end harnesses, conformance-sensitive auth verification, and many narrow regression or fidelity oracles
- narrow seams such as injected dependencies, exported testing helpers, resets, and test-only helper surfaces are part of the current design

The tree should treat those as lane-family and architecture facts, not as proof of the full hidden runner inventory.

## Strongly suggested but not fully proven

The tree can safely treat these as strong signals rather than as closed facts:

- the TypeScript runner environment is Bun-oriented in at least part of the stack
- the regression or unit layer is broader than the few directly named test references exposed in comments and helper exports
- repo-level scripts wrap at least some runner commands instead of every lane being invoked directly

## Still missing for exact runner-level reproduction
Expand All @@ -33,6 +37,7 @@ The current snapshot does not fully expose:

- the top-level repository manifest and script table
- the complete test directory layout
- the exhaustive lane inventory and lane-to-command matrix
- the full committed fixture corpus
- the CI workflow and any sharding or coverage rules

Expand All @@ -44,6 +49,7 @@ While those artifacts are missing, the tree should:

- document the confirmed architecture and tier model
- preserve clear evidence labels for inferred versus confirmed details
- claim lane purpose and behavior ownership more confidently than lane naming or runner wiring
- refuse to guess exact runner wiring that the snapshot did not show

This is a knowledge-quality rule, not a refusal to make progress. The visible framework is already rich enough to guide a clean-room rebuild of the verification architecture itself.
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
title: "Test Environment, Fixtures, and CI Fail-Closed Policy"
owners: [bingran-you]
soft_links:
- /reconstruction-guardrails/verification-and-native-test-oracles/test-runtime-mode-and-determinism.md
- /platform-services/startup-service-sequencing-and-capability-gates.md
- /platform-services/usage-analytics-and-migrations.md
- /integrations/clients/structured-io-and-headless-session-loop.md
Expand All @@ -25,9 +26,15 @@ The important point is not one specific branch. It is that the runtime treats te

## Fixture replay is a first-class oracle

The snapshot exposes a VCR-style replay layer for API-dependent behavior.
The snapshot exposes more than one fixture family for API-adjacent behavior.

That layer preserves:
Equivalent behavior should preserve:

- a generic fixture helper for deterministic caching of arbitrary expensive or externalized test oracles
- message-replay fixtures for API response and streaming behavior
- token-count fixtures for API-adjacent counting paths that still need deterministic replay semantics

Across those families, the shared contract preserves:

- explicit activation in test posture
- hash-based fixture naming from normalized inputs
Expand All @@ -45,13 +52,25 @@ Equivalent behavior should preserve:

This is one of the most important stability contracts in the visible framework. It keeps network-backed tests deterministic and makes fixture refresh a deliberate maintenance act.

## Recording lifecycle must stay deliberate

Equivalent behavior should preserve:

- replay as the default posture once a fixture exists
- explicit record or refresh intent instead of incidental overwrites
- the ability for different API-adjacent callers to reuse the same fixture policy rather than inventing lane-specific caching rules

The important clean-room point is that recording is maintenance, not a side effect of ordinary CI execution.

## Transcript and hash stability matter

The broader runtime also treats transcript shape as part of fixture stability.

Equivalent behavior should preserve:

- careful normalization before hashing
- dehydration of machine-specific paths, config-home locations, and similar environment-local values
- placeholder treatment for incidental UUIDs, timestamps, counters, and other unstable runtime identifiers
- avoidance of unnecessary transcript-shape churn in replay-sensitive flows
- deterministic identity or placeholder handling where raw runtime IDs would otherwise destabilize recordings

Expand All @@ -62,13 +81,15 @@ The visible testing architecture therefore depends on transcript semantics, not
If a clean-room rebuild keeps external API-backed tests, it should preserve all of these:

- a dedicated test posture
- multiple fixture families when different API-adjacent callers need different oracle shapes
- deterministic fixture hashing and hydration
- fail-closed CI behavior for missing recordings
- explicit recording refresh

## Failure modes

- **test-production blur**: automated tests still emit nonessential production side effects
- **layer collapse**: token-count or other API-adjacent lanes bypass the shared fixture policy and drift from replay behavior used elsewhere
- **machine-bound fixtures**: path, cwd, or tempdir differences cause needless cache misses
- **silent CI rewrite**: missing fixtures regenerate during CI and hide behavioral drift
- **hash instability**: transcript or input normalization changes break recordings even when behavior did not meaningfully change
Original file line number Diff line number Diff line change
Expand Up @@ -2,30 +2,34 @@
title: "Test Framework Overview"
owners: [bingran-you]
soft_links:
- /reconstruction-guardrails/verification-and-native-test-oracles/test-runtime-mode-and-determinism.md
- /reconstruction-guardrails/verification-and-native-test-oracles/test-environment-fixtures-and-ci-fail-closed-policy.md
- /reconstruction-guardrails/verification-and-native-test-oracles/test-lane-coverage-map.md
- /reconstruction-guardrails/verification-and-native-test-oracles/e2e-harness-reality-boundaries.md
- /reconstruction-guardrails/verification-and-native-test-oracles/test-seams-reset-hooks-and-injected-dependencies.md
- /reconstruction-guardrails/verification-and-native-test-oracles/native-test-derived-asset-provenance-and-acceptance-rules.md
- /platform-services/mock-rate-limit-scenarios-and-test-contracts.md
- /tools-and-permissions/filesystem-and-shell/sed-command-validation-contracts.md
- /tools-and-permissions/permissions/e2e-permission-testing-contracts.md
- /tools-and-permissions/permissions/yolo-classifier-contracts.md
- /platform-services/settings-schema-compatibility-and-invalid-field-preservation.md
- /integrations/mcp/federated-auth-conformance-and-idp-test-seeding.md
---

# Test Framework Overview

The current Claude Code snapshot does not expose one self-contained `tests/` or runner manifest that answers everything. What it does expose is a layered testing architecture that spans runtime posture, fixtures, dedicated end-to-end harnesses, conformance-sensitive auth flows, and domain-owned contract oracles.
The current Claude Code snapshot does not expose one self-contained `tests/` directory or runner manifest that answers everything. What it does expose is a layered testing architecture that spans runtime posture, fixtures, dedicated end-to-end harnesses, conformance-sensitive auth flows, and domain-owned contract oracles.

## Confirmed layers

The snapshot clearly shows all of these verification layers:
The snapshot provides direct signals for all of these verification layer families, even though it does not expose every upstream runner entrypoint:

- a script-wrapped suite entry layer, because at least one compatibility contract is tied to a named `npm run test:file ...` path rather than to a raw helper invocation
- ordinary module-level regression lanes, including `.test.ts`-style coverage
- integration lanes, including `.int.test.ts` behavior for cross-component runtime state
- end-to-end coverage for permission prompts and remote-control plumbing
- conformance-sensitive auth coverage for federated MCP and XAA-style flows
- runtime test posture via `NODE_ENV=test`
- a supported test runtime posture via `NODE_ENV=test`
- fixture and VCR-style replay for API-dependent scenarios
- module-state isolation through exported reset, seed, and cleanup helpers for caches, watchers, registries, and other sticky services
- domain-owned contract assets derived from upstream-native tests
Expand All @@ -42,17 +46,21 @@ A faithful rebuild should preserve these tiers as distinct concerns:

Collapsing all of those into one broad suite would lose one of the main architectural signals in the current product: different behaviors are protected by different oracles.

The subsystem mapping behind those tiers is spelled out in [test-lane-coverage-map.md](test-lane-coverage-map.md).

## Runner boundary

The tree can safely claim:

- there is a script-oriented entry layer
- the product code is written to coexist with a Bun-flavored module-mocking environment
- the visible framework depends on more than a generic "run tests" command
- the end-to-end harnesses that are visible are designed to preserve real approval, transport, and credential paths rather than UI-only fakes

The tree should not overclaim:

- the exact full upstream runner manifest
- the exhaustive upstream lane inventory
- the complete CI orchestration or sharding plan
- the full top-level command matrix for every lane

Expand Down
Loading
Loading