diff --git a/platform-services/settings-change-detection-and-runtime-reload.md b/platform-services/settings-change-detection-and-runtime-reload.md index 990818d..5d2a278 100644 --- a/platform-services/settings-change-detection-and-runtime-reload.md +++ b/platform-services/settings-change-detection-and-runtime-reload.md @@ -66,6 +66,7 @@ Equivalent behavior should preserve: - the detector resetting the merged-settings cache exactly once before notifying subscribers - all notification paths, including programmatic ones, going through that same reset-and-emit function +- a newly visible settings layer, especially remote managed settings, invalidating any previously merged snapshot that was computed while that layer was still absent - listeners reading fresh merged settings after notification instead of each listener defensively clearing the cache again - the architectural invariant that one settings notification should produce one fresh disk or cache read, not one re-read per subscriber @@ -125,6 +126,7 @@ Equivalent behavior should preserve: - settings-change detector startup being deferred until after the earliest render-critical initialization, so watcher setup does not block first paint - remote managed settings loading beginning after core init enables safe config reading, with results applied later through hot reload when they arrive +- early reads in headless, SDK, bridge, or command-dispatch setup not being allowed to permanently cache the "remote layer absent" view once that layer later becomes available - headless remote user-settings download starting early enough to overlap with MCP and tool setup, while still entering the same detector pipeline when applied - the first interactive mount checking whether a managed-settings race already changed permission availability before the subscriber existed - cleanup shutting down watchers, deletion timers, and machine-settings poll timers when the process disposes @@ -149,6 +151,7 @@ Equivalent behavior should also preserve the boundary that some neighboring hot- - **false deletion**: delete-and-recreate writes are treated as real deletes and temporarily wipe settings from the live runtime - **hook-veto bypass**: config-change hooks report a blocking result, but the reload still lands in app state - **interactive-headless drift**: the TUI and headless SDK path stop sharing the same apply function and begin to diverge on permission, effort, or fast-mode behavior +- **poisoned early read**: a headless or pre-action code path observes merged settings before remote managed settings become visible, and that stale snapshot survives after the overlay arrives - **permission-context drift**: settings JSON updates, but live permission rules or bypass-disable posture are not resynced - **env drift**: auth helpers or config-driven environment variables keep using stale values after settings.env changes - **startup race leak**: remote managed settings arrive before the interactive subscriber exists and a restricted mode, such as bypass disablement, never gets reconciled diff --git a/reconstruction-guardrails/verification-and-native-test-oracles/test-environment-fixtures-and-ci-fail-closed-policy.md b/reconstruction-guardrails/verification-and-native-test-oracles/test-environment-fixtures-and-ci-fail-closed-policy.md index f4ccdb3..2078933 100644 --- a/reconstruction-guardrails/verification-and-native-test-oracles/test-environment-fixtures-and-ci-fail-closed-policy.md +++ b/reconstruction-guardrails/verification-and-native-test-oracles/test-environment-fixtures-and-ci-fail-closed-policy.md @@ -40,6 +40,7 @@ Across those families, the shared contract preserves: - hash-based fixture naming from normalized inputs - replay from a configurable fixture root - rehydration back into runtime-shaped results rather than raw text blobs +- replayed results still participating in the same downstream usage, cost, or accounting paths that live responses would drive - input dehydration and path normalization so equivalent tests keep hitting the same recordings across machines ## CI must fail closed on missing recordings @@ -73,6 +74,7 @@ Equivalent behavior should preserve: - placeholder treatment for incidental UUIDs, timestamps, counters, and other unstable runtime identifiers - avoidance of unnecessary transcript-shape churn in replay-sensitive flows - deterministic identity or placeholder handling where raw runtime IDs would otherwise destabilize recordings +- fresh per-run runtime identity where reused recorded IDs would otherwise cause resume or storage layers to treat distinct replayed responses as duplicates The visible testing architecture therefore depends on transcript semantics, not only on a file cache. @@ -93,3 +95,5 @@ If a clean-room rebuild keeps external API-backed tests, it should preserve all - **machine-bound fixtures**: path, cwd, or tempdir differences cause needless cache misses - **silent CI rewrite**: missing fixtures regenerate during CI and hide behavioral drift - **hash instability**: transcript or input normalization changes break recordings even when behavior did not meaningfully change +- **usage-blind replay**: fixtures reproduce visible output but stop exercising the cost or usage paths that live responses update +- **resume dedupe pollution**: recorded identities are replayed too literally and later resume or storage layers collapse distinct runs into one response diff --git a/reconstruction-guardrails/verification-and-native-test-oracles/test-lane-coverage-map.md b/reconstruction-guardrails/verification-and-native-test-oracles/test-lane-coverage-map.md index cba6ec8..67ceb43 100644 --- a/reconstruction-guardrails/verification-and-native-test-oracles/test-lane-coverage-map.md +++ b/reconstruction-guardrails/verification-and-native-test-oracles/test-lane-coverage-map.md @@ -6,6 +6,7 @@ soft_links: - /reconstruction-guardrails/verification-and-native-test-oracles/e2e-harness-reality-boundaries.md - /platform-services/settings-schema-compatibility-and-invalid-field-preservation.md - /platform-services/settings-change-detection-and-runtime-reload.md + - /ui-and-experience/transcript-and-history/transcript-search-and-less-style-navigation.md - /runtime-orchestration/sessions/session-artifacts-and-sharing.md - /tools-and-permissions/permissions/e2e-permission-testing-contracts.md - /integrations/clients/ssh-remote-session-and-auth-proxy.md @@ -22,7 +23,7 @@ The visible fast lanes protect narrow, local contracts such as: - parser and serializer edge cases - shell and permission safety heuristics -- transcript-search or render-fidelity extraction boundaries +- transcript-search render-fidelity boundaries, where the index must track visible transcript text closely enough to avoid phantom hits - sticky singleton cleanup and helper-state reset behavior These lanes should stay cheap, isolated, and able to run without the full product startup graph. @@ -32,7 +33,7 @@ These lanes should stay cheap, isolated, and able to run without the full produc The visible integration-oriented lanes protect cross-component runtime behavior such as: - startup sequencing and async service readiness -- managed settings cache visibility and hot-reload invalidation +- managed settings cache visibility, including first-visibility invalidation of stale merged settings in headless or early-read paths - watcher and promise state that can be poisoned by one subsystem and observed by another - resume-sensitive session artifacts and related persistence boundaries @@ -43,6 +44,7 @@ These lanes need more real runtime wiring than a pure regression test, but they The visible end-to-end lanes protect workflows where the real orchestration path matters, including: - permission prompt routing and user decision flow +- permission semantics staying lockstep across local dialogs, worker forwarding, bridge replies, and hook-mediated approval outcomes - worker or remote approval forwarding - SSH or remote-control plumbing where local UI and remote execution are split - auth-proxy and transcript-adaptation behavior that spans transport boundaries diff --git a/tools-and-permissions/permissions/permission-decision-pipeline.md b/tools-and-permissions/permissions/permission-decision-pipeline.md index 26860e8..6c5ad99 100644 --- a/tools-and-permissions/permissions/permission-decision-pipeline.md +++ b/tools-and-permissions/permissions/permission-decision-pipeline.md @@ -17,6 +17,18 @@ Reconstruction should preserve two distinct stages: The policy result may also carry rewritten input, explanatory metadata, and persistence suggestions. Later stages must preserve those payloads. +## One semantic contract across every surface + +Equivalent behavior should preserve one permission meaning for `allow`, `deny`, and `ask` across: + +- the interactive REPL path +- headless or SDK-hosted execution +- bridge or remote-session approval forwarding +- worker-to-leader forwarded approvals +- hook-mediated approval or rejection paths + +This is more than shared vocabulary. The same decision payload, rewritten input, and stop-or-continue semantics must survive every surface transition without being reinterpreted into a different local policy. + ## Stage 1: static policy order The base permission engine evaluates in this order: @@ -130,6 +142,12 @@ When the policy engine still returns `ask`, the runtime resolves it through a la Aborts and cancellations should resolve through this permission context instead of throwing uncontrolled exceptions through the turn loop. +Equivalent behavior should also preserve: + +- externalized approval surfaces returning the same final `allow` or `deny` semantics as the local dialog path rather than inventing a second remote-only decision shape +- hook-provided `allow` signals not skipping downstream deny or ask gates that still apply after the hook stage +- any winning approval or denial path committing execution at most once, even if remote responses, hook completions, or local UI actions arrive close together + ## Explanatory messaging User-facing approval prompts vary by reason. @@ -153,3 +171,4 @@ Without this distinction, users cannot tell whether they are overriding their ow - **auto-mode deadloop**: repeated classifier denials never escalate to human review - **headless ambiguity**: a worker that cannot prompt neither aborts nor returns a deterministic deny - **explanation collapse**: every approval prompt looks the same and users cannot distinguish rule, hook, classifier, or path reasons +- **surface skew**: REPL, headless, remote, or worker-forwarded approval paths interpret the same `allow` or `deny` result differently diff --git a/tools-and-permissions/permissions/permission-resolution-races-and-forwarding.md b/tools-and-permissions/permissions/permission-resolution-races-and-forwarding.md index 43cefb8..492030c 100644 --- a/tools-and-permissions/permissions/permission-resolution-races-and-forwarding.md +++ b/tools-and-permissions/permissions/permission-resolution-races-and-forwarding.md @@ -32,6 +32,8 @@ A pending permission can be resolved by several channels: All of these must converge through the same single-winner guard. +They must also preserve the same semantic outcomes. A bridge reply, hook decision, or leader-forwarded response cannot mean something weaker or stronger than the equivalent local allow or deny. + ## Forwarding contracts Equivalent behavior should preserve distinct forwarding paths: @@ -51,6 +53,7 @@ Once any resolver wins, the runtime should: - cancel sibling remote prompts where supported - clear classifier in-progress indicators - clear worker pending-request markers +- prevent duplicate execution when a late remote or hook response arrives after the winning path already resumed the tool Without this, stale prompts or stale listeners can leak into later tool calls. @@ -65,3 +68,4 @@ Late responses for unknown request IDs should be safely ignored. This includes m - **stale remote UI**: remote prompt stays open after local approval already executed - **worker dead-wait**: worker callback registers after sending request and misses fast leader reply - **abort desync**: abort path resolves locally but leaves forwarded requests active +- **semantic split**: forwarded or hook-resolved approvals clear the prompt but do not behave like the normal local allow or deny path diff --git a/ui-and-experience/transcript-and-history/transcript-search-and-less-style-navigation.md b/ui-and-experience/transcript-and-history/transcript-search-and-less-style-navigation.md index b7171c0..88760a6 100644 --- a/ui-and-experience/transcript-and-history/transcript-search-and-less-style-navigation.md +++ b/ui-and-experience/transcript-and-history/transcript-search-and-less-style-navigation.md @@ -1,7 +1,7 @@ --- title: "Transcript Search and Less-Style Navigation" owners: [] -soft_links: [/ui-and-experience/shell-and-input/terminal-runtime-and-fullscreen-interaction.md, /ui-and-experience/dialogs-and-approvals/diff-dialog-and-turn-history-navigation.md, /ui-and-experience/feedback-and-notifications/interaction-feedback.md] +soft_links: [/ui-and-experience/shell-and-input/terminal-runtime-and-fullscreen-interaction.md, /ui-and-experience/dialogs-and-approvals/diff-dialog-and-turn-history-navigation.md, /ui-and-experience/feedback-and-notifications/interaction-feedback.md, /reconstruction-guardrails/verification-and-native-test-oracles/test-lane-coverage-map.md] --- # Transcript Search and Less-Style Navigation @@ -54,6 +54,18 @@ Equivalent behavior should preserve: - a committed query remaining active after manual scroll, so the next `n` or `N` can re-establish exact positioning without forcing the user to reopen `/` - wraparound protection preventing endless loops when every engine-level match turns out to be non-renderable or phantom after layout +## The search index must follow rendered transcript text + +Equivalent behavior should preserve: + +- index text being derived from what transcript mode actually renders, not from model-facing serialization that may contain hidden wrappers, system reminders, or other non-visible payload text +- tool-owned search extractors, when present, describing the transcript-visible rendering of a tool result rather than the internal block payload sent back to the model +- the index excluding sentinel text that later renders as a different visible label, because counting the raw sentinel would create phantom hits +- visibly rendered attachment-derived text, such as queued prompts or surfaced memory content, still entering the search index even when it did not originate as an ordinary chat bubble +- unknown tool-result shapes being allowed to under-count rather than claim text that never renders on screen + +This is an important oracle boundary in the visible test posture: under-count is a tolerable approximation, but indexed-not-rendered text is a correctness bug because it breaks count-versus-highlight trust. + ## Commit, cancel, resize, and transcript exit all diverge deliberately Equivalent behavior should preserve: @@ -80,3 +92,5 @@ Equivalent behavior should preserve: - **stale highlight**: resize or manual scroll leaves the current-match marker painted on the wrong row - **dead query persistence**: a no-match commit keeps a badge and `n/N` state that can no longer navigate - **chrome self-match**: the search bar highlights its own query text and misreports the only visible match +- **phantom index drift**: search counts text from hidden reminders, raw tool payload wrappers, or sentinel strings that the transcript never visibly renders +- **visible-but-unsearchable content**: rendered queued prompts, memories, or tool summaries appear on screen but do not enter the index