diff --git a/engineering/frontmatter-integrity-rfc.md b/engineering/frontmatter-integrity-rfc.md new file mode 100644 index 0000000..cc8af64 --- /dev/null +++ b/engineering/frontmatter-integrity-rfc.md @@ -0,0 +1,527 @@ +# Frontmatter Integrity RFC + +This RFC defines the plan to stop YAML frontmatter corruption, prevent repeat +amplification, and move YAOS toward a structure-aware frontmatter sync model. + +It is intentionally grounded in the 2026-04-09 incident logs: + +- the note involved was `Bathroom floor clean.md` +- the first visible corruption was concentrated in YAML frontmatter +- the duplicate growth involved task-related properties such as + `complete_instances`, `taskSourceType`, and `timeEstimate` +- a later recovery loop in YAOS demonstrably amplified the malformed state + +## Status + +P0 and the first P1 containment pass are implemented: + +- bound-file recovery now uses non-writing binding repair after disk-authority + recovery +- the frontmatter guard blocks obvious duplicate/malformed/growth-burst states + in both disk-to-CRDT and CRDT-to-disk directions +- blocked transitions are traced, surfaced with a throttled user notice, and can + be opted out from Advanced settings for troubleshooting +- blocked paths persist bounded diagnostic-only quarantine metadata in plugin + state for restart-safe debugging +- parser-backed validation now complements the cheap extractor on changed + frontmatter slices, and schema-lite field policies are used for + classification only + +Still proposed: + +- full recovery UI +- structure-aware frontmatter sidecar +- canonical YAML rendering + +## Why this RFC exists + +YAOS currently treats a Markdown file as a single `Y.Text`. + +That is correct for prose and code-like note bodies, but YAML frontmatter has a +different semantic shape: + +- keys should generally be unique +- scalar values and list values are not interchangeable without intent +- many properties are controlled by plugins or Obsidian's Properties UI +- byte-level edits can be serialization noise rather than meaningful value edits +- syntactically valid YAML can still be semantically corrupt for the user's note + +The incident showed two distinct failure modes: + +1. A malformed frontmatter state entered the CRDT from a remote path. +2. YAOS later amplified a small disk/editor/CRDT divergence by applying disk + content to the CRDT and then immediately applying editor content back again. + +The second failure mode is ours and should be fixed immediately. + +The first failure mode may have been triggered by another plugin or another +device. YAOS cannot reliably attribute an Obsidian `modify` event to TaskNotes, +TaskForge, Obsidian Properties, mobile Obsidian, or a user keystroke. YAOS can, +however, detect frontmatter danger by structure and stop propagating obviously +bad states. + +## Goals + +- Stop the verified recovery amplifier. +- Prevent YAOS from writing or propagating newly detected malformed YAML + frontmatter. +- Preserve real-time body sync and cursor behavior. +- Treat external frontmatter writers as normal, supported inputs. +- Make frontmatter changes observable in traces and diagnostics without storing + vault content. +- Define a path from guardrails to structure-aware frontmatter sync. +- Add regression coverage for the incident class. + +## Non-goals + +- Do not attempt to fingerprint or special-case TaskNotes, TaskForge, or any + single third-party plugin as the cause. +- Do not disable body sync for notes with frontmatter. +- Do not replace the vault-wide `Y.Doc` in this RFC. +- Do not require a YAML parser in the hot path for every body keystroke. +- Do not promise perfect conflict resolution for every arbitrary YAML document + in the first implementation pass. +- Do not silently rewrite frontmatter into a surprising format without an + explicit compatibility policy. + +## Explicit product positions + +### The recovery amplifier is a P0 bug + +When an open note is editor-bound and disk matches the editor but not the CRDT, +the current recovery path applies disk content to the CRDT and then calls +`editorBindings.heal()`. `heal()` compares the live editor to the CRDT and can +apply the editor text back into the same `Y.Text`. + +In the 2026-04-09 logs this produced repeated growth: + +- disk `576`, CRDT `560`, editor `608` +- recover disk to CRDT: `560 -> 576` +- heal editor to CRDT: `576 -> 592` +- next pass repeats with a new offset + +The fix is not optional. A repair path must not perform two content writes from +two different observed states in one recovery cycle. + +### Frontmatter needs a stronger invariant than body text + +The body can remain CRDT text. Frontmatter needs structure-aware validation at a +minimum, and structure-aware synchronization as the longer-term target. + +This is not because YAML is special in a theoretical sense. It is because users +and plugins treat frontmatter as a map of properties, while YAOS currently +syncs the serialized bytes. + +### Plugin attribution is the wrong abstraction + +The user mentioned TaskNotes and TaskForge. They may be involved, but the +correct protection boundary is not "detect TaskNotes". + +The correct boundary is: + +- accept that multiple writers can edit frontmatter +- validate the resulting frontmatter shape +- prevent known-bad transitions from becoming authoritative +- expose enough diagnostics to identify the source later when possible + +This keeps YAOS compatible with all frontmatter-writing plugins instead of +playing whack-a-mole with individual integrations. + +### A suspicious frontmatter change should fail safe + +If a local or remote update would introduce duplicate YAML keys, repeated key +bursts, malformed frontmatter delimiters, or pathological growth isolated to +frontmatter, YAOS should avoid making that state authoritative automatically. + +Depending on phase and confidence, the action can be: + +- skip ingest into CRDT +- skip write from CRDT to disk +- keep syncing the body while quarantining the frontmatter change +- surface a notice or diagnostics entry + +The default must be "do not corrupt the note further". + +## Current architecture + +The current markdown model is a single text object: + +```text +MarkdownFile = Y.Text +``` + +The vault model stores file texts by stable ID: + +```text +pathToId: Y.Map +idToText: Y.Map +meta: Y.Map +``` + +This preserves cursor-aware, real-time body editing. It also means a YAML +property edit is merged with the same text-level rules as a paragraph edit. + +That is the core mismatch. + +## Better abstraction + +The longer-term model should treat a Markdown note as a product: + +```text +Note = Frontmatter x Body +``` + +Body remains: + +```text +Body = Y.Text +``` + +Frontmatter becomes a structured value with a parser/printer boundary: + +```text +FrontmatterText <-> FrontmatterValue +``` + +The useful abstraction is a parser/printer boundary: + +```text +MarkdownText + -> get: { frontmatterText, bodyText } + -> put: { frontmatterValue, bodyText } -> MarkdownText +``` + +YAOS should stop treating YAML serialization as the source of truth for +properties. Instead, it should parse YAML into a value, merge that value under +property invariants, and print it back to canonical YAML. + +Practically, that means: + +- duplicate keys are not representable unless explicitly modeled +- scalar/list type transitions are explicit +- list set-like fields can dedupe by value where configured +- formatting differences can be reduced by canonicalization +- raw text fallback remains available when parsing fails or comments/anchors + cannot be preserved + +This abstraction is valuable only if it reduces corruption risk. It should be +introduced incrementally. + +## Scope and implementation plan + +## P0: Stop the verified amplifier + +### 1. Split "repair binding" from "write editor content" + +Current behavior: + +- `syncFileFromDisk()` detects local-only divergence. +- It applies disk content to the CRDT. +- It calls `editorBindings.heal()`. +- `heal()` may write the editor content to the CRDT again. + +Approved change: + +- introduce a binding repair path that reconfigures or validates the binding + without applying editor content +- use that path after `disk-sync-recover-bound` +- reserve content-writing `heal()` for cases where editor content is the single + chosen source of truth + +Minimum accepted patch: + +- after applying disk content to CRDT, do not call content-writing `heal()` +- either call non-writing `repair()` or rebind +- schedule a later health check after the editor has observed the CRDT update + +### 2. Add a regression test for recovery amplification + +The test should model: + +- CRDT text: `---\ntimeEstimate: 2\nkind: op\n---\n` +- disk text: `---\ntimeEstimate: 20\nkind: op\n---\n` +- editor text with stale or shifted content +- recovery path runs + +Expected: + +- exactly one source of truth is applied +- the CRDT does not grow by applying both disk and editor content +- a second recovery pass is a no-op or remains bounded + +### 3. Add trace details for recovery source selection + +Record source-choice metadata without storing note content: + +- path +- reason +- editor length +- disk length +- CRDT length +- chosen source: `disk`, `editor`, `crdt`, `skip` +- action: `applied`, `deferred`, `repair-only`, `quarantined` + +This makes future incidents easier to diagnose without leaking user notes. + +## P1: Frontmatter integrity guard + +### 4. Extract frontmatter ranges cheaply + +Add a small utility that classifies a Markdown document as: + +- no frontmatter +- frontmatter block present +- malformed frontmatter delimiter + +The extractor should return: + +- `frontmatterStart` +- `frontmatterEnd` +- `frontmatterText` +- `bodyText` + +This does not require parsing YAML yet. It is a cheap structural boundary for +guards and diagnostics. + +### 5. Add frontmatter validation + +The first containment pass uses a cheap structural classifier plus parser-backed +validation on the extracted frontmatter slice. It blocks only obvious hazards +such as duplicate top-level keys, repeated bare-key bursts, parser failures, +malformed frontmatter fences, and isolated frontmatter growth bursts. + +That first pass is still an emergency brake, not a complete YAML policy. The +durable implementation should keep parser-backed validation while avoiding +premature merge or rewrite semantics. + +The validator should detect: + +- parser errors +- duplicate mapping keys +- scalar/list type changes where old and new frontmatter are both available +- large repeated growth inside frontmatter +- repeated key bursts such as `taskSourceType` repeated many times +- malformed frontmatter fences + +The validator must report a risk class: + +- `ok` +- `warn` +- `block` +- `unknown` + +### 6. Gate inbound disk-to-CRDT frontmatter changes + +When a local disk modify is ingested: + +- compare old CRDT frontmatter to new disk frontmatter +- if body changed and frontmatter is unchanged, proceed normally +- if frontmatter changed and validates as `ok`, proceed +- if frontmatter validates as `warn`, proceed but trace +- if frontmatter validates as `block`, do not apply it to the CRDT + +For `block`, YAOS should: + +- leave body sync available when the body can be separated safely +- record a diagnostic +- notify the user with short copy + +### 7. Gate outbound CRDT-to-disk frontmatter writes + +When a remote CRDT update would be written to disk: + +- compare disk frontmatter to CRDT frontmatter +- validate the target frontmatter before writing +- if validation blocks, skip the write and keep the disk file unchanged + +This is the defense against another device or third-party plugin introducing a +bad frontmatter state into the room. + +### 8. Add a quarantine state + +Introduce a per-path quarantine record for frontmatter hazards. + +Minimum fields: + +- path +- detectedAt +- direction: `disk-to-crdt` or `crdt-to-disk` +- reason codes +- source lengths +- frontmatter hash only, not content +- whether body sync remains active + +The quarantine should be clearable when: + +- disk and CRDT match again +- the user accepts the incoming state +- the user chooses the local disk state +- the user disables the guard for the path + +The first implementation is skip behavior plus trace/log diagnostics, a +throttled notice, a global opt-out, and bounded persisted quarantine metadata +for debugging. Explicit accept/keep recovery controls should follow only if they +remain consistent with YAOS snapshot and Obsidian File Recovery policy. + +## P2: Structure-aware frontmatter sync + +### 9. Define the frontmatter value model + +Represent frontmatter as a typed value: + +```text +FrontmatterValue = Map + +PropertyValue = + | null + | boolean + | number + | string + | date-like string + | list + | object + | rawYaml +``` + +Preserve a raw fallback for: + +- comments +- anchors +- tags or values the parser cannot round-trip safely +- plugin-specific complex shapes + +### 10. Define merge semantics by value kind + +Initial policy: + +- scalar: last writer wins with timestamp/source metadata +- list: ordered list by default +- configured set-like list: dedupe by normalized value +- object: recursive map merge when safe +- rawYaml: last writer wins or quarantine on concurrent edit + +This should start conservative. Unknown structure should not be over-merged. + +### 11. Add canonical rendering + +Canonical rendering should: + +- use stable key order where possible +- preserve common Obsidian property conventions +- avoid unnecessary quote churn where possible +- ensure exactly one opening and closing frontmatter fence +- render duplicate-key-invalid states as impossible + +This is where formatting debates can become product decisions. The initial +renderer should be predictable and boring. + +### 12. Migrate without breaking existing CRDT state + +Do not remove the current `Y.Text` file model immediately. + +Recommended migration: + +1. Keep whole-file `Y.Text` as the source of truth. +2. Add parsed-frontmatter sidecar metadata for validation and diagnostics. +3. Add optional structured frontmatter sidecar per file. +4. For files that validate cleanly, use the structured sidecar to mediate + frontmatter changes. +5. Render structured frontmatter back into the whole-file text. +6. Keep raw fallback for files that cannot round-trip. + +This avoids a risky all-at-once CRDT schema change. + +## P3: Product surface and recovery UX + +### 13. Add a user-facing frontmatter conflict notice + +Copy should be short and non-technical: + +```text +YAOS paused a properties update in "Bathroom floor clean.md" because it looked malformed. +Your note body is still syncing. +``` + +Actions: + +- keep local properties +- accept remote properties +- open diagnostics +- disable frontmatter guard for this note + +### 14. Add diagnostics export fields + +Diagnostics should include: + +- frontmatter guard status +- quarantined paths +- reason codes +- hashes and lengths +- source device when known +- local/remote timestamps when known + +Do not include YAML contents by default. + +## P4: Tests + +Add tests for: + +- duplicate YAML keys are detected +- repeated-key bursts are detected +- list item deletion does not duplicate list items +- scalar/list type changes are classified +- malformed frontmatter fences are blocked +- body-only edits bypass frontmatter validation cost +- outbound remote corrupt frontmatter is not written to disk +- inbound local corrupt frontmatter is not propagated to CRDT +- recovery amplifier does not recur +- quarantine clears when disk and CRDT converge + +## Acceptance criteria + +P0 is complete when: + +- the recovery path cannot apply disk and editor content to the same CRDT text in + one cycle +- a regression test covers the old repeated-growth shape + +P1 is complete when: + +- YAOS can detect and block obvious duplicate frontmatter states in both + disk-to-CRDT and CRDT-to-disk directions +- body sync continues when a frontmatter update is quarantined and the body can + be separated safely +- diagnostics show reason codes without note contents + +P2 is complete when: + +- clean frontmatter can be parsed, merged, and rendered through a structured + value model +- raw YAML fallback exists +- existing whole-file `Y.Text` sync still works for unstructured files + +## Open questions + +- Which YAML parser should be used in the plugin bundle? +- Should the first guard block duplicate keys only, or also repeated list-item + bursts? +- Should structured frontmatter be opt-in during the first release? +- Should set-like behavior be configured per property name, for example + `tags` and selected TaskNotes properties? +- How should mobile performance be measured for validation on large notes? +- Should quarantined frontmatter updates be persisted in plugin data for restart + recovery in the first implementation, or start as trace-only? + +## Recommended order of work + +1. Implement P0 recovery amplifier fix. +2. Add recovery amplifier regression coverage. +3. Add frontmatter extractor and parser validation utilities. +4. Gate outbound CRDT-to-disk writes for duplicate/malformed frontmatter. +5. Gate inbound disk-to-CRDT ingest for duplicate/malformed frontmatter. +6. Add quarantine diagnostics. +7. Add user-facing conflict notice. +8. Design and implement structured frontmatter sidecar. + +This order fixes the known YAOS bug first, then adds guardrails against other +plugins and devices, then moves toward the deeper abstraction. diff --git a/package-lock.json b/package-lock.json index 7eba237..c1d6eb2 100644 --- a/package-lock.json +++ b/package-lock.json @@ -11,6 +11,7 @@ "dependencies": { "fast-diff": "^1.3.0", "fflate": "^0.8.2", + "js-yaml": "^4.1.1", "obsidian": "1.8.7", "partyserver": "0.3.2", "qrcode": "^1.5.4", diff --git a/package.json b/package.json index bc3a588..a196dac 100644 --- a/package.json +++ b/package.json @@ -9,7 +9,7 @@ "build": "tsc -noEmit -skipLibCheck && node esbuild.config.mjs production", "build:server-release": "node build-server-release.mjs", "test:server-update-local": "npm run build:server-release && node tests/server-update-local.mjs", - "test:regressions": "node --import jiti/register tests/diff-regressions.mjs && node tests/disk-mirror-regressions.mjs && node tests/markdown-ingest-regressions.mjs && node --import jiti/register tests/closed-file-mirror.ts && node --import jiti/register tests/folder-rename.ts && node --import jiti/register tests/chunked-doc-store.ts && node --import jiti/register tests/trace-store.ts && node --import jiti/register tests/server-hardening.ts && node --import jiti/register tests/v2-offline-rename-regressions.mjs", + "test:regressions": "node --import jiti/register tests/diff-regressions.mjs && node --import jiti/register tests/bound-recovery-regressions.mjs && node --import jiti/register tests/frontmatter-guard-regressions.mjs && node --import jiti/register tests/frontmatter-quarantine-regressions.mjs && node tests/disk-mirror-regressions.mjs && node tests/markdown-ingest-regressions.mjs && node --import jiti/register tests/closed-file-mirror.ts && node --import jiti/register tests/folder-rename.ts && node --import jiti/register tests/chunked-doc-store.ts && node --import jiti/register tests/trace-store.ts && node --import jiti/register tests/server-hardening.ts && node --import jiti/register tests/v2-offline-rename-regressions.mjs", "test:integration:worker": "node tests/worker-integration.mjs", "test:e2e:obsidian": "wdio run wdio.conf.mts", "test:ci": "npm run test:regressions && npm run test:integration:worker", @@ -35,6 +35,7 @@ "dependencies": { "fast-diff": "^1.3.0", "fflate": "^0.8.2", + "js-yaml": "^4.1.1", "obsidian": "1.8.7", "partyserver": "0.3.2", "qrcode": "^1.5.4", diff --git a/src/js-yaml.d.ts b/src/js-yaml.d.ts new file mode 100644 index 0000000..4c59bd5 --- /dev/null +++ b/src/js-yaml.d.ts @@ -0,0 +1,9 @@ +declare module "js-yaml" { + export function load(yaml: string): unknown; + + const yaml: { + load: typeof load; + }; + + export default yaml; +} diff --git a/src/main.ts b/src/main.ts index aa0fef8..ece4643 100644 --- a/src/main.ts +++ b/src/main.ts @@ -22,6 +22,19 @@ import { } from "./update/updateManifest"; import { isMarkdownSyncable, isBlobSyncable } from "./types"; import { applyDiffToYText } from "./sync/diff"; +import { + isFrontmatterBlocked, + validateFrontmatterTransition, + extractFrontmatter, + type FrontmatterValidationResult, +} from "./sync/frontmatterGuard"; +import { + buildFrontmatterQuarantineDebugLines, + clearFrontmatterQuarantinePath, + readPersistedFrontmatterQuarantine, + upsertFrontmatterQuarantineEntry, + type FrontmatterQuarantineEntry, +} from "./sync/frontmatterQuarantine"; import { type DiskIndex, collectFileStats, @@ -72,6 +85,7 @@ type PersistedPluginState = Partial & { _blobQueue?: BlobQueueSnapshot; _serverCapabilitiesCache?: PersistedServerCapabilitiesCache; _updateManifestCache?: PersistedUpdateManifestCache; + _frontmatterQuarantine?: FrontmatterQuarantineEntry[]; }; /** Minimum interval between reconcile runs (prevents rapid reconnect churn). */ @@ -82,6 +96,7 @@ const FAST_RECONNECT_MIN_INTERVAL_MS = 2_000; const MARKDOWN_DIRTY_SETTLE_MS = 350; const OPEN_FILE_EXTERNAL_EDIT_IDLE_GRACE_MS = 1200; const BOUND_RECOVERY_LOCK_MS = 1500; +const FRONTMATTER_GUARD_NOTICE_MS = 30_000; const CAPABILITY_REFRESH_INTERVAL_MS = 30_000; const UPDATE_MANIFEST_URLS = [ "https://github.com/kavinsood/yaos/releases/latest/download/update-manifest.json", @@ -279,6 +294,8 @@ export default class VaultCrdtSyncPlugin extends Plugin { private legacyServerNoticeShown = false; private commandsRegistered = false; private idbDegradedHandled = false; + private frontmatterGuardNoticeAt = new Map(); + private frontmatterQuarantineEntries: FrontmatterQuarantineEntry[] = []; /** * True when startup timed out waiting for provider sync. @@ -441,6 +458,16 @@ export default class VaultCrdtSyncPlugin extends Plugin { this.editorBindings, this.settings.debug, (source, msg, details) => this.trace(source, msg, details), + () => this.settings.frontmatterGuardEnabled, + (path, direction, reason, validation, previousContent, nextContent) => + this.handleFrontmatterValidation( + path, + direction, + reason, + validation, + previousContent, + nextContent, + ), ); this.diskMirror.startMapObservers(); @@ -1880,6 +1907,15 @@ export default class VaultCrdtSyncPlugin extends Plugin { if (existingText) { const crdtContent = existingText.toJSON(); if (crdtContent === content) return; + if (this.shouldBlockFrontmatterIngest( + file.path, + crdtContent, + content, + "disk-to-crdt", + )) { + await this.updateDiskIndexForPath(file.path); + return; + } // Apply a line-level diff to the Y.Text instead of delete-all + insert-all. // This preserves CRDT history, cursor positions, and awareness state. @@ -1890,6 +1926,15 @@ export default class VaultCrdtSyncPlugin extends Plugin { ); applyDiffToYText(existingText, crdtContent, content, "disk-sync"); } else { + if (this.shouldBlockFrontmatterIngest( + file.path, + null, + content, + "disk-to-crdt-seed", + )) { + await this.updateDiskIndexForPath(file.path); + return; + } this.vaultSync.ensureFile( file.path, content, @@ -1989,12 +2034,39 @@ export default class VaultCrdtSyncPlugin extends Plugin { }); if (existingText) { + if (this.shouldBlockFrontmatterIngest( + file.path, + crdtContent ?? "", + content, + "bound-file-local-only-divergence", + )) { + this.scheduleTraceStateSnapshot("frontmatter-ingest-blocked"); + return true; + } this.log( `syncFileFromDisk: recovering "${file.path}" ` + `(editor-bound local-only divergence: ${crdtContent?.length ?? 0} -> ${content.length} chars)`, ); + this.trace("trace", "bound-file-recovery-source-selected", { + path: file.path, + reason: "bound-file-local-only-divergence", + chosenSource: "disk", + action: "applied-repair-only", + editorLengths: localOnlyViews.map((state) => state.editorContent.length), + diskLength: content.length, + crdtLength: crdtContent?.length ?? null, + }); applyDiffToYText(existingText, crdtContent ?? "", content, "disk-sync-recover-bound"); } else { + if (this.shouldBlockFrontmatterIngest( + file.path, + null, + content, + "bound-file-local-only-seed", + )) { + this.scheduleTraceStateSnapshot("frontmatter-ingest-blocked"); + return true; + } this.log( `syncFileFromDisk: recovering "${file.path}" ` + `(editor-bound, missing CRDT text: seeding ${content.length} chars)`, @@ -2004,23 +2076,23 @@ export default class VaultCrdtSyncPlugin extends Plugin { content, this.settings.deviceName, ); - } + } this.boundRecoveryLocks.set(file.path, Date.now() + BOUND_RECOVERY_LOCK_MS); - for (const state of localOnlyViews) { - const repaired = this.editorBindings?.heal( + for (const state of localOnlyViews) { + const repaired = this.editorBindings?.repair( + state.view, + this.settings.deviceName, + "bound-file-local-only-divergence", + ) ?? false; + if (!repaired) { + this.editorBindings?.rebind( state.view, this.settings.deviceName, "bound-file-local-only-divergence", - ) ?? false; - if (!repaired) { - this.editorBindings?.rebind( - state.view, - this.settings.deviceName, - "bound-file-local-only-divergence", - ); - } + ); } + } this.scheduleTraceStateSnapshot("bound-file-desync-recovery"); return true; @@ -2041,12 +2113,30 @@ export default class VaultCrdtSyncPlugin extends Plugin { // Active editor is open but idle; treat disk as an external edit // and ingest it into CRDT instead of deferring forever. if (existingText) { + if (this.shouldBlockFrontmatterIngest( + file.path, + crdtContent ?? "", + content, + "bound-file-open-idle-disk-recovery", + )) { + this.scheduleTraceStateSnapshot("frontmatter-ingest-blocked"); + return true; + } this.log( `syncFileFromDisk: recovering "${file.path}" ` + `(editor-bound external disk edit while idle: ${crdtContent?.length ?? 0} -> ${content.length} chars)`, ); applyDiffToYText(existingText, crdtContent ?? "", content, "disk-sync-open-idle-recover"); } else { + if (this.shouldBlockFrontmatterIngest( + file.path, + null, + content, + "bound-file-open-idle-seed", + )) { + this.scheduleTraceStateSnapshot("frontmatter-ingest-blocked"); + return true; + } this.log( `syncFileFromDisk: recovering "${file.path}" ` + `(editor-bound idle disk edit, missing CRDT text: seeding ${content.length} chars)`, @@ -2086,6 +2176,134 @@ export default class VaultCrdtSyncPlugin extends Plugin { return true; } + private shouldBlockFrontmatterIngest( + path: string, + previousContent: string | null, + nextContent: string, + reason: string, + ): boolean { + if (!this.settings.frontmatterGuardEnabled) return false; + + const validation = validateFrontmatterTransition(previousContent, nextContent); + this.handleFrontmatterValidation( + path, + "disk-to-crdt", + reason, + validation, + previousContent, + nextContent, + ); + if (!isFrontmatterBlocked(validation)) return false; + this.log( + `Frontmatter ingest blocked for "${path}" ` + + `(${validation.reasons.join(", ") || validation.risk})`, + ); + return true; + } + + private handleFrontmatterValidation( + path: string, + direction: "disk-to-crdt" | "crdt-to-disk", + reason: string, + validation: FrontmatterValidationResult, + previousContent: string | null, + nextContent: string, + ): void { + if (validation.risk === "ok") { + void this.clearFrontmatterQuarantine(path, `${direction}:${reason}`); + return; + } + + if (!isFrontmatterBlocked(validation)) return; + + this.traceFrontmatterQuarantine( + path, + direction, + reason, + validation, + previousContent?.length ?? null, + nextContent.length, + ); + this.showFrontmatterGuardNotice(path, direction); + void this.persistFrontmatterQuarantine(path, direction, validation, previousContent, nextContent); + } + + private showFrontmatterGuardNotice( + path: string, + direction: "disk-to-crdt" | "crdt-to-disk", + ): void { + const key = `${direction}:${path}`; + const now = Date.now(); + if ((this.frontmatterGuardNoticeAt.get(key) ?? 0) + FRONTMATTER_GUARD_NOTICE_MS > now) { + return; + } + + this.frontmatterGuardNoticeAt.set(key, now); + new Notice( + `YAOS paused a properties update in "${path}" because the frontmatter looked unsafe. Check diagnostics before accepting the change.`, + 12_000, + ); + } + + private traceFrontmatterQuarantine( + path: string, + direction: "disk-to-crdt" | "crdt-to-disk", + reason: string, + validation: FrontmatterValidationResult, + previousLength: number | null, + nextLength: number, + ): void { + this.trace("trace", "frontmatter-quarantined", { + path, + direction, + reason, + risk: validation.risk, + reasons: validation.reasons, + previousLength, + nextLength, + previousFrontmatterLength: validation.previousFrontmatterLength ?? null, + nextFrontmatterLength: validation.frontmatterLength, + }); + } + + private async persistFrontmatterQuarantine( + path: string, + direction: "disk-to-crdt" | "crdt-to-disk", + validation: FrontmatterValidationResult, + previousContent: string | null, + nextContent: string, + ): Promise { + const now = Date.now(); + const prevHash = await this.hashFrontmatterContent(previousContent); + const nextHash = await this.hashFrontmatterContent(nextContent); + this.frontmatterQuarantineEntries = upsertFrontmatterQuarantineEntry( + this.frontmatterQuarantineEntries, + { + path, + firstSeenAt: now, + lastSeenAt: now, + direction, + reasons: validation.reasons, + prevHash, + nextHash, + count: 1, + }, + ); + await this.persistPluginState(); + } + + private async clearFrontmatterQuarantine(path: string, reason: string): Promise { + if (this.frontmatterQuarantineEntries.length === 0) return; + const nextEntries = clearFrontmatterQuarantinePath(this.frontmatterQuarantineEntries, path); + if (nextEntries.length === this.frontmatterQuarantineEntries.length) return; + this.frontmatterQuarantineEntries = nextEntries; + this.trace("trace", "frontmatter-quarantine-cleared", { + path, + reason, + }); + await this.persistPluginState(); + } + private async updateDiskIndexForPath(path: string): Promise { try { const stat = await this.app.vault.adapter.stat(path); @@ -2530,6 +2748,7 @@ export default class VaultCrdtSyncPlugin extends Plugin { this.updateManifest = null; this.updateManifestFetchedAt = 0; } + this.frontmatterQuarantineEntries = readPersistedFrontmatterQuarantine(data?._frontmatterQuarantine); this.refreshPersistedState(); if (migratedSettings) { await this.persistPluginState(); @@ -3378,6 +3597,11 @@ export default class VaultCrdtSyncPlugin extends Plugin { } else { delete nextState._updateManifestCache; } + if (this.frontmatterQuarantineEntries.length > 0) { + nextState._frontmatterQuarantine = this.frontmatterQuarantineEntries; + } else { + delete nextState._frontmatterQuarantine; + } this.persistedState = nextState; } @@ -3435,6 +3659,7 @@ export default class VaultCrdtSyncPlugin extends Plugin { `Open files: ${this.openFilePaths.size}`, `Server trace events: ${this.recentServerTrace.length}`, `Remote cursors: ${this.settings.showRemoteCursors ? "shown" : "hidden"}`, + ...buildFrontmatterQuarantineDebugLines(this.frontmatterQuarantineEntries), ].join("\n"); } @@ -3458,6 +3683,13 @@ export default class VaultCrdtSyncPlugin extends Plugin { return arrayBufferToHex(digest); } + private async hashFrontmatterContent(content: string | null): Promise { + if (content == null) return undefined; + const block = extractFrontmatter(content); + if (block.kind !== "present") return undefined; + return await this.sha256Hex(block.frontmatterText); + } + private async exportDiagnostics(): Promise { if (!this.vaultSync) { new Notice("Sync not initialized"); diff --git a/src/settings.ts b/src/settings.ts index caed12a..45b8a18 100644 --- a/src/settings.ts +++ b/src/settings.ts @@ -17,6 +17,8 @@ export interface VaultSyncSettings { deviceName: string; /** Enable verbose console.log output for debugging. */ debug: boolean; + /** Pause propagation of suspicious YAML frontmatter transitions. */ + frontmatterGuardEnabled: boolean; /** Comma-separated path prefixes to exclude from sync. */ excludePatterns: string; /** Maximum file size in KB to sync via CRDT. Files larger are skipped. */ @@ -50,6 +52,7 @@ export const DEFAULT_SETTINGS: VaultSyncSettings = { vaultId: "", deviceName: "", debug: false, + frontmatterGuardEnabled: true, excludePatterns: "", maxFileSizeKB: 2048, externalEditPolicy: "always", @@ -691,6 +694,18 @@ export class VaultSyncSettingTab extends PluginSettingTab { }), ); + new Setting(advancedBody) + .setName("Frontmatter safety guard") + .setDesc("Pause suspicious YAML property updates before they spread. Disable only while troubleshooting valid frontmatter that is being blocked.") + .addToggle((toggle) => + toggle + .setValue(this.plugin.settings.frontmatterGuardEnabled) + .onChange(async (value) => { + this.plugin.settings.frontmatterGuardEnabled = value; + await this.plugin.saveSettings(); + }), + ); + new Setting(advancedBody) .setName("Debug logging") .setDesc("Enable verbose console logs for troubleshooting.") diff --git a/src/sync/diskMirror.ts b/src/sync/diskMirror.ts index 77bf1fb..9589050 100644 --- a/src/sync/diskMirror.ts +++ b/src/sync/diskMirror.ts @@ -6,6 +6,11 @@ import { ORIGIN_SEED } from "../types"; import { ORIGIN_RESTORE } from "./snapshotClient"; import type { TraceRecord } from "../debug/trace"; import { formatUnknown, yTextToString } from "../utils/format"; +import { + isFrontmatterBlocked, + validateFrontmatterTransition, + type FrontmatterValidationResult, +} from "./frontmatterGuard"; /** * Handles writeback from Y.Text -> disk with: @@ -98,6 +103,15 @@ export class DiskMirror { private editorBindings: EditorBindingManager, debug: boolean, private trace?: TraceRecord, + private frontmatterGuardEnabled: () => boolean = () => true, + private onFrontmatterValidated?: ( + path: string, + direction: "crdt-to-disk", + reason: "flush-write", + validation: FrontmatterValidationResult, + previousContent: string | null, + nextContent: string, + ) => void, ) { this.debug = debug; } @@ -440,11 +454,17 @@ export class DiskMirror { this.log(`flushWrite: "${path}" unchanged, skipping`); return; } + if (this.shouldBlockFrontmatterWrite(path, currentContent, content)) { + return; + } await this.suppressWrite(path, content); await this.app.vault.modify(existing, content); this.log(`flushWrite: updated "${path}" (${content.length} chars)`); } else { + if (this.shouldBlockFrontmatterWrite(path, null, content)) { + return; + } await this.suppressWrite(path, content); const dir = normalized.substring(0, normalized.lastIndexOf("/")); if (dir) { @@ -464,6 +484,42 @@ export class DiskMirror { } } + private shouldBlockFrontmatterWrite( + path: string, + previousContent: string | null, + nextContent: string, + ): boolean { + if (!this.frontmatterGuardEnabled()) return false; + + const validation = validateFrontmatterTransition(previousContent, nextContent); + this.onFrontmatterValidated?.( + path, + "crdt-to-disk", + "flush-write", + validation, + previousContent, + nextContent, + ); + if (!isFrontmatterBlocked(validation)) return false; + + this.trace?.("trace", "frontmatter-quarantined", { + path, + direction: "crdt-to-disk", + reason: "flush-write", + risk: validation.risk, + reasons: validation.reasons, + previousLength: previousContent?.length ?? null, + nextLength: nextContent.length, + previousFrontmatterLength: validation.previousFrontmatterLength ?? null, + nextFrontmatterLength: validation.frontmatterLength, + }); + this.log( + `frontmatter write blocked for "${path}" ` + + `(${validation.reasons.join(", ") || validation.risk})`, + ); + return true; + } + private async handleRemoteDelete(path: string): Promise { const normalized = normalizePath(path); const wasOpen = this.openPaths.has(normalized); diff --git a/src/sync/frontmatterGuard.ts b/src/sync/frontmatterGuard.ts new file mode 100644 index 0000000..569986b --- /dev/null +++ b/src/sync/frontmatterGuard.ts @@ -0,0 +1,354 @@ +import yaml from "js-yaml"; + +export type FrontmatterRisk = "ok" | "warn" | "block" | "unknown"; +export type FieldPolicy = "register" | "ordered-list" | "set-like" | "opaque"; + +export interface FrontmatterValidationResult { + risk: FrontmatterRisk; + reasons: string[]; + frontmatterLength: number | null; + previousFrontmatterLength?: number | null; +} + +type FrontmatterBlock = + | { kind: "none" } + | { kind: "malformed"; reason: string } + | { + kind: "present"; + frontmatterText: string; + bodyText: string; + start: number; + end: number; + }; + +type ParsedFrontmatter = { + root: Record | null; + blockReasons: string[]; + warnReasons: string[]; +}; + +type ValueKind = "null" | "scalar" | "array" | "object"; + +const FRONTMATTER_OPEN = "---"; +const FRONTMATTER_CLOSE = new Set(["---", "..."]); +const REPEATED_KEY_BURST_THRESHOLD = 3; + +const FIELD_POLICIES: Record = { + aliases: "ordered-list", + cssclasses: "set-like", + tags: "set-like", + timeestimate: "register", + tasksourcetype: "register", + title: "register", +}; + +export function validateFrontmatterTransition( + previousContent: string | null | undefined, + nextContent: string, +): FrontmatterValidationResult { + const previousBlock = previousContent != null ? extractFrontmatter(previousContent) : { kind: "none" as const }; + const next = extractFrontmatter(nextContent); + const previousLength = previousBlock.kind === "present" ? previousBlock.frontmatterText.length : null; + + if (next.kind === "none") { + return { + risk: "ok", + reasons: [], + frontmatterLength: null, + previousFrontmatterLength: previousLength, + }; + } + + if (next.kind === "malformed") { + return { + risk: "block", + reasons: [`malformed-frontmatter:${next.reason}`], + frontmatterLength: null, + previousFrontmatterLength: previousLength, + }; + } + + const blockReasons = new Set(); + const warnReasons = new Set(); + const heuristicAnalysis = analyzeFrontmatter(next.frontmatterText); + addReasons(blockReasons, heuristicAnalysis.blockReasons); + addReasons(warnReasons, heuristicAnalysis.warnReasons); + + const nextLength = next.frontmatterText.length; + if ( + previousLength != null + && previousLength > 0 + && nextLength > previousLength * 2 + && nextLength - previousLength > 128 + ) { + blockReasons.add("frontmatter-growth-burst"); + } + + const parsedNext = parseFrontmatter(next.frontmatterText); + addReasons(blockReasons, parsedNext.blockReasons); + addReasons(warnReasons, parsedNext.warnReasons); + + if (parsedNext.root) { + const parsedPrevious = + previousBlock.kind === "present" && previousBlock.frontmatterText !== next.frontmatterText + ? parseFrontmatter(previousBlock.frontmatterText) + : { root: {} as Record, blockReasons: [], warnReasons: [] }; + const policyAnalysis = analyzeFieldPolicies(parsedPrevious.root ?? {}, parsedNext.root); + addReasons(blockReasons, policyAnalysis.blockReasons); + addReasons(warnReasons, policyAnalysis.warnReasons); + } + + return { + risk: + blockReasons.size > 0 + ? "block" + : (warnReasons.size > 0 ? "warn" : "ok"), + reasons: + blockReasons.size > 0 + ? Array.from(blockReasons).sort() + : Array.from(warnReasons).sort(), + frontmatterLength: nextLength, + previousFrontmatterLength: previousLength, + }; +} + +export function isFrontmatterBlocked(result: FrontmatterValidationResult): boolean { + return result.risk === "block"; +} + +export function extractFrontmatter(content: string): FrontmatterBlock { + const firstLineEnd = findLineEnd(content, 0); + const firstLine = content.slice(0, firstLineEnd).trim(); + if (firstLine !== FRONTMATTER_OPEN) { + return { kind: "none" }; + } + + let cursor = advancePastLineBreak(content, firstLineEnd); + const frontmatterStart = cursor; + while (cursor < content.length) { + const lineEnd = findLineEnd(content, cursor); + const line = content.slice(cursor, lineEnd).trim(); + if (FRONTMATTER_CLOSE.has(line)) { + const bodyStart = advancePastLineBreak(content, lineEnd); + return { + kind: "present", + frontmatterText: content.slice(frontmatterStart, cursor), + bodyText: content.slice(bodyStart), + start: frontmatterStart, + end: cursor, + }; + } + cursor = advancePastLineBreak(content, lineEnd); + } + + return { kind: "malformed", reason: "missing-closing-fence" }; +} + +export function getFieldPolicy(fieldName: string): FieldPolicy { + return FIELD_POLICIES[normalizeFieldName(fieldName)] ?? "opaque"; +} + +function getFrontmatterLength(content: string | null | undefined): number | null { + if (content == null) return null; + const block = extractFrontmatter(content); + return block.kind === "present" ? block.frontmatterText.length : null; +} + +function analyzeFrontmatter(frontmatterText: string): { blockReasons: string[]; warnReasons: string[] } { + const blockReasons = new Set(); + const warnReasons = new Set(); + const topLevelKeys = new Map(); + const bareTopLevelKeys = new Map(); + + for (const rawLine of frontmatterText.split(/\r?\n/)) { + const line = rawLine.trimEnd(); + const trimmed = line.trim(); + if (!trimmed || trimmed.startsWith("#")) continue; + + if (/^\s/.test(line) || trimmed.startsWith("- ")) continue; + + const keyMatch = /^([A-Za-z0-9_-][A-Za-z0-9_-]*)\s*:/.exec(trimmed); + const quotedKeyMatch = /^["']([^"']+)["']\s*:/.exec(trimmed); + const key = keyMatch?.[1] ?? quotedKeyMatch?.[1]; + if (key) { + const count = (topLevelKeys.get(key) ?? 0) + 1; + topLevelKeys.set(key, count); + if (count > 1) blockReasons.add(`duplicate-key:${key}`); + continue; + } + + const bareKeyMatch = /^([A-Za-z0-9_-][A-Za-z0-9_-]*)$/.exec(trimmed); + if (bareKeyMatch?.[1]) { + const key = bareKeyMatch[1]; + const count = (bareTopLevelKeys.get(key) ?? 0) + 1; + bareTopLevelKeys.set(key, count); + blockReasons.add(`bare-top-level-scalar:${key}`); + if (count >= REPEATED_KEY_BURST_THRESHOLD) { + blockReasons.add(`repeated-bare-key-burst:${key}`); + } + continue; + } + + warnReasons.add("unknown-top-level-yaml"); + } + + for (const [key, count] of topLevelKeys) { + if (count >= REPEATED_KEY_BURST_THRESHOLD) { + blockReasons.add(`repeated-key-burst:${key}`); + } + } + + return { + blockReasons: Array.from(blockReasons), + warnReasons: Array.from(warnReasons), + }; +} + +function parseFrontmatter(frontmatterText: string): ParsedFrontmatter { + try { + const parsed = yaml.load(frontmatterText); + if (parsed == null) { + return { + root: {}, + blockReasons: [], + warnReasons: [], + }; + } + + if (!isPlainObject(parsed)) { + return { + root: null, + blockReasons: [], + warnReasons: ["frontmatter-non-map-root"], + }; + } + + return { + root: parsed, + blockReasons: [], + warnReasons: [], + }; + } catch (error) { + const message = error instanceof Error ? error.message : String(error); + const reason = message.includes("duplicated mapping key") + ? "yaml-parse-duplicate-key" + : "yaml-parse-error"; + return { + root: null, + blockReasons: [reason], + warnReasons: [], + }; + } +} + +function analyzeFieldPolicies( + previousRoot: Record, + nextRoot: Record, +): { blockReasons: string[]; warnReasons: string[] } { + const blockReasons = new Set(); + const warnReasons = new Set(); + const allKeys = new Set([ + ...Object.keys(previousRoot), + ...Object.keys(nextRoot), + ]); + + for (const key of allKeys) { + const policy = getFieldPolicy(key); + if (policy === "opaque") continue; + + const hasPrevious = Object.prototype.hasOwnProperty.call(previousRoot, key); + const hasNext = Object.prototype.hasOwnProperty.call(nextRoot, key); + if (!hasNext) continue; + + const nextValue = nextRoot[key]; + if (policy === "set-like" && Array.isArray(nextValue) && hasDuplicateNormalizedValues(nextValue)) { + warnReasons.add(`set-like-duplicates:${key}`); + } + + if (!hasPrevious) continue; + const previousValue = previousRoot[key]; + const previousKind = getValueKind(previousValue); + const nextKind = getValueKind(nextValue); + if (previousKind === nextKind) continue; + + if (policy === "register") { + blockReasons.add(`field-type-flip:${key}:${previousKind}->${nextKind}`); + continue; + } + + if ((policy === "ordered-list" || policy === "set-like") + && (previousKind === "array" || nextKind === "array")) { + blockReasons.add(`field-type-flip:${key}:${previousKind}->${nextKind}`); + } + } + + return { + blockReasons: Array.from(blockReasons), + warnReasons: Array.from(warnReasons), + }; +} + +function normalizeFieldName(fieldName: string): string { + return fieldName.trim().toLowerCase(); +} + +function isPlainObject(value: unknown): value is Record { + return typeof value === "object" + && value !== null + && !Array.isArray(value) + && !(value instanceof Date); +} + +function getValueKind(value: unknown): ValueKind { + if (value === null) return "null"; + if (Array.isArray(value)) return "array"; + if (value instanceof Date) return "scalar"; + if (typeof value === "object") return "object"; + return "scalar"; +} + +function hasDuplicateNormalizedValues(values: unknown[]): boolean { + const seen = new Set(); + for (const value of values) { + const normalized = normalizeValue(value); + if (seen.has(normalized)) return true; + seen.add(normalized); + } + return false; +} + +function normalizeValue(value: unknown): string { + if (value instanceof Date) { + return `date:${value.toISOString()}`; + } + if (Array.isArray(value) || isPlainObject(value)) { + return JSON.stringify(value); + } + return `${typeof value}:${String(value)}`; +} + +function addReasons(target: Set, reasons: string[]): void { + for (const reason of reasons) { + target.add(reason); + } +} + +function findLineEnd(content: string, start: number): number { + const newline = content.indexOf("\n", start); + if (newline === -1) return content.length; + return content.charCodeAt(newline - 1) === 13 ? newline - 1 : newline; +} + +function advancePastLineBreak(content: string, lineEnd: number): number { + if (lineEnd >= content.length) return content.length; + if (content.charCodeAt(lineEnd) === 13 && content.charCodeAt(lineEnd + 1) === 10) { + return lineEnd + 2; + } + if (content.charCodeAt(lineEnd) === 10) { + return lineEnd + 1; + } + if (content.charCodeAt(lineEnd) === 13) { + return lineEnd + 1; + } + return lineEnd; +} diff --git a/src/sync/frontmatterQuarantine.ts b/src/sync/frontmatterQuarantine.ts new file mode 100644 index 0000000..a7e1a90 --- /dev/null +++ b/src/sync/frontmatterQuarantine.ts @@ -0,0 +1,111 @@ +export interface FrontmatterQuarantineEntry { + path: string; + firstSeenAt: number; + lastSeenAt: number; + direction: "disk-to-crdt" | "crdt-to-disk"; + reasons: string[]; + prevHash?: string; + nextHash?: string; + count: number; +} + +export const MAX_FRONTMATTER_QUARANTINE_ENTRIES = 128; + +export function readPersistedFrontmatterQuarantine(value: unknown): FrontmatterQuarantineEntry[] { + if (!Array.isArray(value)) return []; + + return value + .map((entry) => sanitizeEntry(entry)) + .filter((entry): entry is FrontmatterQuarantineEntry => entry !== null) + .sort((left, right) => right.lastSeenAt - left.lastSeenAt) + .slice(0, MAX_FRONTMATTER_QUARANTINE_ENTRIES); +} + +export function upsertFrontmatterQuarantineEntry( + entries: FrontmatterQuarantineEntry[], + entry: FrontmatterQuarantineEntry, + limit = MAX_FRONTMATTER_QUARANTINE_ENTRIES, +): FrontmatterQuarantineEntry[] { + const normalized = { + ...entry, + reasons: normalizeReasons(entry.reasons), + }; + const existingIndex = entries.findIndex((candidate) => candidate.path === normalized.path); + const nextEntries = [...entries]; + + if (existingIndex >= 0) { + const existing = nextEntries[existingIndex]; + if (!existing) { + return nextEntries.slice(0, limit); + } + nextEntries[existingIndex] = { + path: existing.path, + firstSeenAt: existing.firstSeenAt, + lastSeenAt: normalized.lastSeenAt, + direction: normalized.direction, + reasons: normalized.reasons, + prevHash: normalized.prevHash, + nextHash: normalized.nextHash, + count: existing.count + 1, + }; + } else { + nextEntries.push(normalized); + } + + nextEntries.sort((left, right) => right.lastSeenAt - left.lastSeenAt); + return nextEntries.slice(0, limit); +} + +export function clearFrontmatterQuarantinePath( + entries: FrontmatterQuarantineEntry[], + path: string, +): FrontmatterQuarantineEntry[] { + return entries.filter((entry) => entry.path !== path); +} + +export function buildFrontmatterQuarantineDebugLines( + entries: FrontmatterQuarantineEntry[], + limit = 3, +): string[] { + const visibleEntries = entries.slice(0, limit); + const lines = [`Frontmatter quarantines: ${entries.length}`]; + for (const entry of visibleEntries) { + lines.push( + `Frontmatter quarantine: ${entry.path} [${entry.direction}] x${entry.count} ${entry.reasons.join(", ")}`, + ); + } + return lines; +} + +function sanitizeEntry(value: unknown): FrontmatterQuarantineEntry | null { + if (typeof value !== "object" || value === null) return null; + const candidate = value as Partial; + if ( + typeof candidate.path !== "string" + || typeof candidate.firstSeenAt !== "number" + || typeof candidate.lastSeenAt !== "number" + || (candidate.direction !== "disk-to-crdt" && candidate.direction !== "crdt-to-disk") + || !Array.isArray(candidate.reasons) + || typeof candidate.count !== "number" + ) { + return null; + } + + const reasons = normalizeReasons( + candidate.reasons.filter((reason): reason is string => typeof reason === "string"), + ); + return { + path: candidate.path, + firstSeenAt: candidate.firstSeenAt, + lastSeenAt: candidate.lastSeenAt, + direction: candidate.direction, + reasons, + prevHash: typeof candidate.prevHash === "string" ? candidate.prevHash : undefined, + nextHash: typeof candidate.nextHash === "string" ? candidate.nextHash : undefined, + count: candidate.count, + }; +} + +function normalizeReasons(reasons: string[]): string[] { + return Array.from(new Set(reasons)).sort(); +} diff --git a/tests/bound-recovery-regressions.mjs b/tests/bound-recovery-regressions.mjs new file mode 100644 index 0000000..8249cfd --- /dev/null +++ b/tests/bound-recovery-regressions.mjs @@ -0,0 +1,112 @@ +import * as Y from "yjs"; + +const diffModule = await import("../src/sync/diff.ts"); +const { applyDiffToYText } = diffModule.default; + +let passed = 0; +let failed = 0; + +function assert(condition, name) { + if (condition) { + console.log(` PASS ${name}`); + passed++; + } else { + console.error(` FAIL ${name}`); + failed++; + } +} + +function makeText(content) { + const doc = new Y.Doc(); + const ytext = doc.getText("content"); + ytext.insert(0, content); + return { doc, ytext }; +} + +console.log("\n--- Test 1: bound-file recovery applies one content authority ---"); +{ + const crdt = [ + "---", + "timeEstimate: 2", + "kind: op", + "---", + "", + ].join("\n"); + const disk = [ + "---", + "timeEstimate: 20", + "kind: op", + "---", + "", + ].join("\n"); + const staleEditor = [ + "---", + "timeEstimate: 200", + "kind: op", + "---", + "", + ].join("\n"); + + const fixed = makeText(crdt); + applyDiffToYText(fixed.ytext, crdt, disk, "disk-sync-recover-bound"); + assert( + fixed.ytext.toString() === disk, + "fixed recovery leaves CRDT at the chosen disk content", + ); + fixed.doc.destroy(); + + const oldAmplifier = makeText(crdt); + applyDiffToYText(oldAmplifier.ytext, crdt, disk, "disk-sync-recover-bound"); + applyDiffToYText(oldAmplifier.ytext, disk, staleEditor, "editor-health-heal"); + assert( + oldAmplifier.ytext.toString() === staleEditor, + "old disk-then-heal sequence can reapply stale editor content", + ); + assert( + oldAmplifier.ytext.toString() !== disk, + "old disk-then-heal sequence does not preserve the chosen disk authority", + ); + oldAmplifier.doc.destroy(); +} + +console.log("\n--- Test 2: repeated disk-authority recovery does not amplify stale editor state ---"); +{ + const crdt = [ + "---", + "timeEstimate: 2", + "kind: op", + "---", + "", + ].join("\n"); + const disk = [ + "---", + "timeEstimate: 20", + "kind: op", + "---", + "", + ].join("\n"); + const staleEditor = [ + "---", + "timeEstimate: 200", + "kind: op", + "---", + "", + ].join("\n"); + + const state = makeText(crdt); + for (let i = 0; i < 5; i++) { + const before = state.ytext.toString(); + applyDiffToYText(state.ytext, before, disk, "disk-sync-recover-bound"); + } + + assert(state.ytext.toString() === disk, "repeated disk-authority recovery stays at disk content"); + assert(state.ytext.toString() !== staleEditor, "stale editor content is not reapplied during repair-only recovery"); + assert(state.ytext.toString().length === disk.length, "repeated repair-only recovery does not grow content"); + state.doc.destroy(); +} + +console.log(`\n${"-".repeat(50)}`); +console.log(`Results: ${passed} passed, ${failed} failed`); +console.log(`${"-".repeat(50)}\n`); + +process.exit(failed > 0 ? 1 : 0); diff --git a/tests/frontmatter-guard-regressions.mjs b/tests/frontmatter-guard-regressions.mjs new file mode 100644 index 0000000..5102ee6 --- /dev/null +++ b/tests/frontmatter-guard-regressions.mjs @@ -0,0 +1,400 @@ +const guardModule = await import("../src/sync/frontmatterGuard.ts"); +const guard = guardModule.default ?? guardModule; +const { + extractFrontmatter, + getFieldPolicy, + validateFrontmatterTransition, + isFrontmatterBlocked, +} = guard; + +let passed = 0; +let failed = 0; + +function assert(condition, name) { + if (condition) { + console.log(` PASS ${name}`); + passed++; + } else { + console.error(` FAIL ${name}`); + failed++; + } +} + +class FrontmatterBridgeHarness { + constructor({ guardEnabled = true } = {}) { + this.guardEnabled = guardEnabled; + this.disk = new Map(); + this.crdt = new Map(); + this.blocked = []; + this.ingestCount = 0; + this.writeCount = 0; + } + + inbound(path) { + const next = this.disk.get(path); + if (typeof next !== "string") throw new Error(`Missing disk content for ${path}`); + const previous = this.crdt.get(path) ?? null; + const validation = validateFrontmatterTransition(previous, next); + if (this.guardEnabled && isFrontmatterBlocked(validation)) { + this.blocked.push({ path, direction: "disk-to-crdt", validation }); + return false; + } + this.crdt.set(path, next); + this.ingestCount++; + return true; + } + + outbound(path) { + const next = this.crdt.get(path); + if (typeof next !== "string") throw new Error(`Missing CRDT content for ${path}`); + const previous = this.disk.get(path) ?? null; + const validation = validateFrontmatterTransition(previous, next); + if (this.guardEnabled && isFrontmatterBlocked(validation)) { + this.blocked.push({ path, direction: "crdt-to-disk", validation }); + return false; + } + this.disk.set(path, next); + this.writeCount++; + return true; + } +} + +console.log("\n--- Test 1: body-only markdown bypasses frontmatter guard ---"); +{ + const result = validateFrontmatterTransition( + "body before\n", + "body after\n", + ); + assert(result.risk === "ok", "body-only edit is ok"); + assert(result.frontmatterLength === null, "body-only edit has no frontmatter length"); +} + +console.log("\n--- Test 2: duplicate frontmatter keys are blocked ---"); +{ + const next = [ + "---", + "taskSourceType: taskNotes", + "taskSourceType: taskNotes", + "---", + "body", + ].join("\n"); + const result = validateFrontmatterTransition(null, next); + assert(isFrontmatterBlocked(result), "duplicate key is blocked"); + assert(result.reasons.includes("duplicate-key:taskSourceType"), "duplicate key reason is reported"); +} + +console.log("\n--- Test 3: repeated bare key bursts are blocked ---"); +{ + const next = [ + "---", + "taskSourceType", + "taskSourceType", + "taskSourceType", + "---", + "body", + ].join("\n"); + const result = validateFrontmatterTransition(null, next); + assert(isFrontmatterBlocked(result), "repeated bare key burst is blocked"); + assert( + result.reasons.includes("repeated-bare-key-burst:taskSourceType"), + "bare key burst reason is reported", + ); +} + +console.log("\n--- Test 4: quoted duplicate frontmatter keys are blocked ---"); +{ + const next = [ + "---", + "\"task source\": taskNotes", + "\"task source\": taskNotes", + "---", + "body", + ].join("\n"); + const result = validateFrontmatterTransition(null, next); + assert(isFrontmatterBlocked(result), "quoted duplicate key is blocked"); + assert(result.reasons.includes("duplicate-key:task source"), "quoted duplicate key reason is reported"); +} + +console.log("\n--- Test 5: unknown top-level YAML warns instead of blocking ---"); +{ + const next = [ + "---", + "? complex", + ": value", + "---", + "body", + ].join("\n"); + const result = validateFrontmatterTransition(null, next); + assert(result.risk === "warn", "unknown top-level YAML is a warning"); + assert(!isFrontmatterBlocked(result), "unknown top-level YAML is not blocked"); +} + +console.log("\n--- Test 6: malformed frontmatter fence is blocked ---"); +{ + const next = [ + "---", + "title: Broken", + "body that never closed", + ].join("\n"); + const result = validateFrontmatterTransition(null, next); + assert(isFrontmatterBlocked(result), "missing closing fence is blocked"); + assert( + result.reasons.includes("malformed-frontmatter:missing-closing-fence"), + "malformed fence reason is reported", + ); +} + +console.log("\n--- Test 7: frontmatter growth burst is blocked ---"); +{ + const previous = [ + "---", + "title: Short", + "---", + "body", + ].join("\n"); + const next = [ + "---", + "title: Short", + `notes: ${"x".repeat(300)}`, + "---", + "body", + ].join("\n"); + const result = validateFrontmatterTransition(previous, next); + assert(isFrontmatterBlocked(result), "large frontmatter-only growth burst is blocked"); + assert(result.reasons.includes("frontmatter-growth-burst"), "growth burst reason is reported"); +} + +console.log("\n--- Test 8: extractor separates frontmatter and body ---"); +{ + const markdown = [ + "---", + "title: Clean", + "---", + "", + "body", + ].join("\n"); + const block = extractFrontmatter(markdown); + assert(block.kind === "present", "frontmatter block is detected"); + assert(block.kind === "present" && block.frontmatterText.includes("title: Clean"), "frontmatter text is extracted"); + assert(block.kind === "present" && block.bodyText === "\nbody", "body text is extracted"); +} + +console.log("\n--- Test 9: parser-backed validation blocks invalid YAML ---"); +{ + const next = [ + "---", + "title: [broken", + "---", + "body", + ].join("\n"); + const result = validateFrontmatterTransition(null, next); + assert(isFrontmatterBlocked(result), "parser error is blocked"); + assert(result.reasons.includes("yaml-parse-error"), "parser error reason is reported"); +} + +console.log("\n--- Test 10: schema-lite register fields block scalar/list flips ---"); +{ + const previous = [ + "---", + "tags:", + " - home", + "timeEstimate: 20", + "---", + "body", + ].join("\n"); + const next = [ + "---", + "tags: home", + "timeEstimate:", + " - 20", + "---", + "body", + ].join("\n"); + const result = validateFrontmatterTransition(previous, next); + assert(isFrontmatterBlocked(result), "known field type flips are blocked"); + assert( + result.reasons.includes("field-type-flip:tags:array->scalar"), + "list field type flip reason is reported", + ); + assert( + result.reasons.includes("field-type-flip:timeEstimate:scalar->array"), + "register field type flip reason is reported", + ); +} + +console.log("\n--- Test 11: set-like duplicates warn instead of rewriting ---"); +{ + const next = [ + "---", + "tags:", + " - home", + " - home", + "---", + "body", + ].join("\n"); + const result = validateFrontmatterTransition(null, next); + assert(result.risk === "warn", "set-like duplicate values warn"); + assert(result.reasons.includes("set-like-duplicates:tags"), "set-like duplicate reason is reported"); +} + +console.log("\n--- Test 12: field policy registry stays schema-lite ---"); +{ + assert(getFieldPolicy("timeEstimate") === "register", "timeEstimate is treated as register"); + assert(getFieldPolicy("tags") === "set-like", "tags is treated as set-like"); + assert(getFieldPolicy("complete_instances") === "opaque", "unknown plugin fields stay opaque"); +} + +console.log("\n--- Test 13: inbound blocked frontmatter does not poison CRDT ---"); +{ + const path = "Bathroom floor clean.md"; + const clean = [ + "---", + "timeEstimate: 20", + "---", + "body", + ].join("\n"); + const corrupt = [ + "---", + "timeEstimate: 20", + "timeEstimate: 200", + "---", + "body", + ].join("\n"); + const bridge = new FrontmatterBridgeHarness(); + bridge.crdt.set(path, clean); + bridge.disk.set(path, corrupt); + + assert(!bridge.inbound(path), "inbound corrupt frontmatter is blocked"); + assert(bridge.crdt.get(path) === clean, "blocked inbound content does not update CRDT"); + assert(bridge.blocked[0]?.direction === "disk-to-crdt", "inbound block records direction"); +} + +console.log("\n--- Test 14: outbound blocked frontmatter does not mutate disk ---"); +{ + const path = "Bathroom floor clean.md"; + const clean = [ + "---", + "timeEstimate: 20", + "---", + "body", + ].join("\n"); + const corrupt = [ + "---", + "timeEstimate: 20", + "timeEstimate: 200", + "---", + "body", + ].join("\n"); + const bridge = new FrontmatterBridgeHarness(); + bridge.disk.set(path, clean); + bridge.crdt.set(path, corrupt); + + assert(!bridge.outbound(path), "outbound corrupt frontmatter is blocked"); + assert(bridge.disk.get(path) === clean, "blocked outbound content does not update disk"); + assert(bridge.blocked[0]?.direction === "crdt-to-disk", "outbound block records direction"); +} + +console.log("\n--- Test 15: repeated blocked retries do not loop writes ---"); +{ + const path = "Bathroom floor clean.md"; + const clean = [ + "---", + "timeEstimate: 20", + "---", + "body", + ].join("\n"); + const corrupt = [ + "---", + "timeEstimate: 20", + "timeEstimate: 200", + "---", + "body", + ].join("\n"); + const bridge = new FrontmatterBridgeHarness(); + bridge.disk.set(path, clean); + bridge.crdt.set(path, corrupt); + + for (let i = 0; i < 3; i++) { + assert(!bridge.outbound(path), `blocked retry ${i + 1} remains blocked`); + } + assert(bridge.disk.get(path) === clean, "repeated blocked retries leave disk unchanged"); + assert(bridge.writeCount === 0, "repeated blocked retries do not perform writes"); +} + +console.log("\n--- Test 16: body-only edits still flow through the guard harness ---"); +{ + const path = "Body only.md"; + const bridge = new FrontmatterBridgeHarness(); + bridge.crdt.set(path, "body before\n"); + bridge.disk.set(path, "body after\n"); + + assert(bridge.inbound(path), "body-only inbound edit is imported"); + assert(bridge.crdt.get(path) === "body after\n", "body-only inbound edit updates CRDT"); + + bridge.crdt.set(path, "body after again\n"); + assert(bridge.outbound(path), "body-only outbound edit is written"); + assert(bridge.disk.get(path) === "body after again\n", "body-only outbound edit updates disk"); +} + +console.log("\n--- Test 17: incident-shaped frontmatter corruption is blocked without spread ---"); +{ + const path = "Bathroom floor clean.md"; + const clean = [ + "---", + "timeEstimate: 20", + "taskSourceType: taskNotes", + "complete_instances:", + " - 2026-04-09", + "---", + "body", + ].join("\n"); + const corrupt = [ + "---", + "timeEstimate: 20", + "taskSourceType: taskNotes", + "taskSourceType: taskNotes", + "complete_instances:", + " - 2026-04-09", + " - 2026-04-09", + "---", + "body", + ].join("\n"); + const bridge = new FrontmatterBridgeHarness(); + bridge.disk.set(path, clean); + bridge.crdt.set(path, clean); + + bridge.crdt.set(path, corrupt); + assert(!bridge.outbound(path), "incident-shaped outbound corruption is blocked"); + assert(bridge.disk.get(path) === clean, "blocked incident-shaped corruption does not reach disk"); +} + +console.log("\n--- Test 18: disabled guard allows suspicious frontmatter for troubleshooting ---"); +{ + const path = "Bathroom floor clean.md"; + const clean = [ + "---", + "timeEstimate: 20", + "---", + "body", + ].join("\n"); + const corrupt = [ + "---", + "timeEstimate: 20", + "timeEstimate: 200", + "---", + "body", + ].join("\n"); + const bridge = new FrontmatterBridgeHarness({ guardEnabled: false }); + bridge.disk.set(path, clean); + bridge.crdt.set(path, corrupt); + + assert(bridge.outbound(path), "disabled guard allows outbound write"); + assert(bridge.disk.get(path) === corrupt, "disabled guard writes the suspicious state"); + assert(bridge.blocked.length === 0, "disabled guard records no block"); +} + +console.log(`\n${"-".repeat(50)}`); +console.log(`Results: ${passed} passed, ${failed} failed`); +console.log(`${"-".repeat(50)}\n`); + +process.exit(failed > 0 ? 1 : 0); diff --git a/tests/frontmatter-quarantine-regressions.mjs b/tests/frontmatter-quarantine-regressions.mjs new file mode 100644 index 0000000..8b9213a --- /dev/null +++ b/tests/frontmatter-quarantine-regressions.mjs @@ -0,0 +1,139 @@ +const quarantineModule = await import("../src/sync/frontmatterQuarantine.ts"); +const quarantine = quarantineModule.default ?? quarantineModule; +const { + buildFrontmatterQuarantineDebugLines, + clearFrontmatterQuarantinePath, + readPersistedFrontmatterQuarantine, + upsertFrontmatterQuarantineEntry, +} = quarantine; + +let passed = 0; +let failed = 0; + +function assert(condition, name) { + if (condition) { + console.log(` PASS ${name}`); + passed++; + } else { + console.error(` FAIL ${name}`); + failed++; + } +} + +console.log("\n--- Test 1: quarantine upsert is per-path and strictly diagnostic ---"); +{ + let entries = []; + entries = upsertFrontmatterQuarantineEntry(entries, { + path: "Bathroom floor clean.md", + firstSeenAt: 10, + lastSeenAt: 10, + direction: "disk-to-crdt", + reasons: ["duplicate-key:taskSourceType", "duplicate-key:taskSourceType"], + prevHash: "prev-a", + nextHash: "next-a", + count: 1, + }); + entries = upsertFrontmatterQuarantineEntry(entries, { + path: "Bathroom floor clean.md", + firstSeenAt: 20, + lastSeenAt: 20, + direction: "crdt-to-disk", + reasons: ["yaml-parse-error"], + prevHash: "prev-b", + nextHash: "next-b", + count: 1, + }); + + assert(entries.length === 1, "same path collapses into one diagnostic entry"); + assert(entries[0]?.count === 2, "same path increments count"); + assert(entries[0]?.direction === "crdt-to-disk", "same path keeps the latest direction"); + assert(entries[0]?.reasons.length === 1 && entries[0]?.reasons[0] === "yaml-parse-error", "same path keeps normalized latest reasons"); +} + +console.log("\n--- Test 2: quarantine stays bounded and newest-first ---"); +{ + let entries = []; + for (let i = 0; i < 5; i++) { + entries = upsertFrontmatterQuarantineEntry(entries, { + path: `note-${i}.md`, + firstSeenAt: i, + lastSeenAt: i, + direction: "disk-to-crdt", + reasons: [`reason-${i}`], + count: 1, + }, 3); + } + + assert(entries.length === 3, "quarantine entry list is capped"); + assert(entries[0]?.path === "note-4.md", "newest entry stays first"); + assert(entries[2]?.path === "note-2.md", "oldest retained entry is the cutoff"); +} + +console.log("\n--- Test 3: quarantine clears by path on clean convergence ---"); +{ + const entries = [ + { + path: "keep.md", + firstSeenAt: 1, + lastSeenAt: 2, + direction: "disk-to-crdt", + reasons: ["a"], + count: 1, + }, + { + path: "clear.md", + firstSeenAt: 3, + lastSeenAt: 4, + direction: "crdt-to-disk", + reasons: ["b"], + count: 2, + }, + ]; + const next = clearFrontmatterQuarantinePath(entries, "clear.md"); + assert(next.length === 1, "clear removes only the target path"); + assert(next[0]?.path === "keep.md", "clear keeps unrelated paths"); +} + +console.log("\n--- Test 4: persisted quarantine state is sanitized ---"); +{ + const entries = readPersistedFrontmatterQuarantine([ + { + path: "Bathroom floor clean.md", + firstSeenAt: 10, + lastSeenAt: 20, + direction: "disk-to-crdt", + reasons: ["z", "a", "z"], + prevHash: "prev", + nextHash: "next", + count: 3, + }, + { nope: true }, + ]); + + assert(entries.length === 1, "invalid persisted entries are dropped"); + assert(entries[0]?.reasons.join(",") === "a,z", "persisted reasons are normalized"); +} + +console.log("\n--- Test 5: debug lines summarize quarantined paths without content ---"); +{ + const lines = buildFrontmatterQuarantineDebugLines([ + { + path: "Bathroom floor clean.md", + firstSeenAt: 10, + lastSeenAt: 20, + direction: "disk-to-crdt", + reasons: ["yaml-parse-error"], + count: 2, + }, + ]); + + assert(lines[0] === "Frontmatter quarantines: 1", "debug header includes entry count"); + assert(lines[1]?.includes("Bathroom floor clean.md"), "debug summary includes path"); + assert(!lines[1]?.includes("prevHash"), "debug summary does not expose hashes or content by default"); +} + +console.log(`\n${"-".repeat(50)}`); +console.log(`Results: ${passed} passed, ${failed} failed`); +console.log(`${"-".repeat(50)}\n`); + +process.exit(failed > 0 ? 1 : 0);