Step 1: AI-Output sediment system (writeback + stale sweep) + bilingual user guide by 2233admin · Pull Request #5 · 2233admin/obsidian-llm-wiki

2233admin · 2026-04-20T20:01:02Z

What this adds

The "persona outputs survive the session" design — Step 1 of the dependency chain ③ writeback → ① provenance → ② active-push. Without this, every /vault-architect / /vault-gardener / etc. analysis evaporates at session end; the vault only sediments the human's notes, not the collaboration history.

Two new MCP ops

`vault.writeAIOutput`

Persona calls this to persist a meaningful analysis. Lands at:

{vault}/00-Inbox/AI-Output/{persona}/YYYY-MM-DD-{slug}.md

With a 6-field frontmatter that a future sweep (or human) can reason about:

---
generated-by: vault-architect
generated-at: 2026-04-21T14:32:00.000Z
agent: claude-opus-4-7
parent-query: "refactor authentication module"
source-nodes:
  - "[[auth-architecture]]"
  - "[[session-tokens]]"
status: draft
---

Each field solves a distinct future-failure class: who / when / what-quality / why / from-where / still-valid. Every persona's skill file got a per-persona invocation template appended.

`vault.sweepAIOutput`

Gardener calls this to manage the sediment lifecycle. Two report types:

Stale candidates — drafts past per-persona age threshold (architect=45d, gardener=30d, historian=180d, others=60d) AND with zero backlinks from non-AI-Output files. AI-Output → AI-Output references are self-anchoring hallucination chains and do not count.
Supersede candidates — reviewed pairs with source-nodes Jaccard ≥ 0.6 and a newer generated-at. Reported only; never auto-applied.

dry_run: true (default) returns the report without writing. dry_run: false flips status: draft → stale in-place via narrow regex substitution (body + other fm fields preserved).

Design rationale (condensed)

B → A lifecycle (gardener auto-stale + human manual-review) beats the reverse. Maintenance cost of "default trust + auto-stale" is O(误判 rate); cost of "default suspect + human promotes" is O(total volume). At 300 AI-Output files in 6 months you'll only promote ~5.
Backlink filter excludes AI-Output sources because two AI-Output files can happen to cite each other without either being grounded in anything a human cared about. Forces at least one "this was worth keeping" vote from a human-written note.
Supersede stays human-confirmed — auto-applying "this new one replaces that old one" would be AI grading its own successors, a second hallucination layer. Gardener proposes, human decides.
Per-persona subdirs for MVP simplicity; schema carries enough info to graduate to flat + yaml-lookup without code changes if per-content-type differentiation proves more important than per-persona later.

Install + try it

New comprehensive bilingual guide lands in this PR:

docs/GUIDE.md — English user journey walkthrough, 268 lines
docs/GUIDE.zh-CN.md — 简体中文镜像，268 lines

Minimum viable try after merge:

git clone --depth 1 https://github.com/2233admin/obsidian-llm-wiki.git ~/obsidian-llm-wiki-src
cd ~/obsidian-llm-wiki-src && git checkout feat/ai-output-sediment && ./setup

Then in Claude Code: /vault-architect suggest 3 refactors for my vault structure. Watch the file appear at 00-Inbox/AI-Output/vault-architect/YYYY-MM-DD-suggest-3-refactors.md.

Verification

npm run build — tsc 0 errors
node.exe --test dist/**/*.test.js — 98 tests pass, 0 fail (88 baseline + 10 new)
npm run generate-tools-doc — deterministic output; drift guard green

New tests cover the full contract:

Scenario	Covered
dryRun=false writes all 6 frontmatter fields + `status: draft`	✅
dryRun default returns plan without writing	✅
Collision appends `-2` suffix same day/slug	✅
Invalid persona rejected with code -32602	✅
Empty sourceNodes serializes as `[]`	✅
Stale detection uses injected `now` for determinism	✅
Drafts with non-AI-Output backlinks are exempted	✅
AI-Output → AI-Output backlinks do NOT anchor	✅
`dry_run: false` flips status in-place, preserves body	✅
Supersede candidates fire on Jaccard ≥ 0.6	✅

Incidental fixes included

class VaultFs is now exported from mcp-server/src/index.ts (was module-private).
main() now guarded by import.meta.url === process.argv[1]. Without this, importing VaultFs from tests would spin up StdioServerTransport and hold stdin open, blocking all subsequent test files. One-liner, preserves runnable-as-entrypoint behavior.

Files modified

mcp-server/src/core/operations.ts — 2 new Operation entries (+27)
mcp-server/src/index.ts — 2 dispatch cases + export + main guard (+190)
mcp-server/src/ai-output.test.ts — NEW, 10 tests (+297)
skills/vault-{architect,curator,gardener,historian,janitor,librarian,teacher}.md — +17 each (gardener +23)
docs/ai-output-convention.md — NEW schema + lifecycle + FAQ (~120)
docs/mcp-tools-reference.md — auto-regenerated (+29)
docs/GUIDE.md + docs/GUIDE.zh-CN.md — NEW bilingual guide (268 each)
README.md — language-switch blockquote above description
progress.txt — Step 1 shipping summary + handoff

Explicitly deferred

Auto reviewed → superseded flip (currently reports only)
YAML-config stale policy (currently hardcoded)
user-verdict frontmatter field
Provenance enrichment of other MCP op responses (Step 2 in the dependency chain)

Ready for review or merge. Base is v2-staging so this layers on top of PR #4 — merge order: #4 first, then this.

Persona outputs now sediment into {vault}/00-Inbox/AI-Output/{persona}/ with a 6-field provenance frontmatter (generated-by / generated-at / agent / parent-query / source-nodes / status). Gardener can subsequently flip aged drafts to stale with a backlink-source test. vault.writeAIOutput - Typed params with persona regex ^vault-[a-z]+$ validation - Auto-slug from parentQuery (first 6 words, kebab-case, fs-safe); YYYY-MM-DD-slug.md under per-persona subdir - Collision loop appends -2 through -99 - YAML-subset serialization round-trips through the existing parseFrontmatter (no reformatting drift) - dryRun default true (matches project convention); on dry run returns the computed path + frontmatter without writing - Status hardcoded `draft` on write vault.sweepAIOutput - Hardcoded per-persona thresholds: architect=45d, gardener=30d, historian=180d, librarian=60d, catch-all=60d (MVP; graduate to yaml config in Step 1.5 once usage justifies) - Stale rule = age-past-threshold AND zero non-AI-Output backlinks (source files whose own frontmatter carries `generated-by` do NOT anchor -- AI->AI references would be self-anchoring hallucination chains) - Supersede candidates reported on same-persona reviewed pairs with source-nodes Jaccard >= 0.6 and newer generated-at; human confirms, never auto-applies -- AI self-grading is a second-layer hallucination deliberately avoided - dry_run=false rewrites frontmatter `status: draft` to `stale` in-place via narrow regex substitution (body + other fm fields untouched) - `now` param provides ISO timestamp injection seam for tests Tests: 10 unit tests in mcp-server/src/ai-output.test.ts covering the write contract (all 6 fields, dryRun default, collision, persona validation, empty sourceNodes serialization) and the sweep contract (threshold detection with injected now, backlink filter, AI-Output->AI-Output non-anchoring, in-place status flip preserving body, supersede candidate Jaccard gate). node:test + tmpdir isolation, zero new deps. Incidental fix: added an `import.meta.url === process.argv[1]` guard around main() in index.ts so importing VaultFs from tests does not start the stdio MCP server. Without the guard, StdioServerTransport keeps the process alive and subsequent test files never run. Also exported `class VaultFs` (was previously private to the module) so ai-output.test.ts can construct a fresh instance per test against a tmpdir vault. Total: 98 tests pass (88 baseline + 10 new). tsc 0 errors. See docs/ai-output-convention.md for the human-readable schema contract + FAQ. Closes Step 1 of the AI-Output sediment system.

Every persona now instructs its runtime to persist meaningful analyses via vault.writeAIOutput, so the vault sediments both the human's notes and the AI collaboration history. - skills/vault-{architect,curator,teacher,historian,janitor, librarian,gardener}.md: appended `## Sediment convention` block with a per-persona vault.writeAIOutput invocation template + status lifecycle pointer. - skills/vault-gardener.md: additionally appended `## Sweep convention (gardener-only responsibility)` -- dry-run first, require explicit user confirmation before dry_run:false flips, never auto-apply supersede candidates. - docs/ai-output-convention.md (NEW, ~120 lines): schema table, legal status transitions, per-persona thresholds, FAQ covering "who flips reviewed", "what counts as a backlink", "why exclude AI->AI references", "what happens to stale entries". Non-trivial design reasoning (why B->A beats A->B for the draft->stale/reviewed split, why the 6-field schema is minimal, why supersede stays semi-automatic) lives in planning notes, not in-tree.

Auto-generated via `npm run generate-tools-doc` after adding vault.writeAIOutput + vault.sweepAIOutput. Drift guard test now passes (mcp-server/src/scripts/generate-tools-doc.test.ts). Purely mechanical diff: 2 new ops appear under the `vault.*` namespace section with their param tables; no hand-edits.

…ess handoff - docs/GUIDE.md (NEW, 268 lines): user-journey oriented walkthrough -- pitch, 30-second install, first useful 5-minute session with 5 real prompt examples, 7-persona table, AI-Output sediment plain- English explanation, common prompts cheat sheet, troubleshooting, FAQ. Links to INSTALL.md for install-depth, ai-output-convention.md for sediment schema, mcp-tools-reference.md for tool catalog. - docs/GUIDE.zh-CN.md (NEW, 268 lines): structure-mirrored Chinese translation. Idiomatic (not literal) rendering of the same content for the Chinese dev community. Top-of-page language switch link. - README.md: single-line language-switch header pointing at both GUIDE files. Keeps README lean; GUIDE.md carries the depth. - progress.txt: Step 1 (AI-Output sediment) shipping summary, engineering notes for next editor, updated known-gaps roster.

gemini-code-assist

Code Review

This pull request introduces the 'AI-Output sediment' feature, which allows persona-authored analyses to be persisted in the vault. It adds two new MCP tools: vault.writeAIOutput for saving analyses with structured frontmatter and vault.sweepAIOutput for managing the lifecycle of these files (staling old drafts and identifying supersede candidates). The PR also includes comprehensive documentation, unit tests, and updates to persona skill files to utilize the new writeback capability. Feedback focuses on optimizing the sweep operation's performance and ensuring YAML frontmatter integrity by sanitizing newlines in user-provided strings.

gemini-code-assist · 2026-04-20T20:03:25Z

+        const hasRealBacklink = (targetRel: string): boolean => {
+          const targetBase = basename(targetRel, ".md");
+          let found = false;
+          this.walkMd((relPath, content) => {
+            if (found) return;
+            if (relPath === targetRel) return;
+            if (aiOutputPaths.has(relPath)) return; // AI-Output -> AI-Output doesn't anchor
+            for (const l of this.parseWikilinks(content)) {
+              const linkPath = l.link.split("#")[0];
+              if (!linkPath) continue;
+              if (linkPath === targetRel || linkPath === targetBase || linkPath + ".md" === targetRel) {
+                found = true;
+                return;
+              }
+            }
+          });
+          return found;
+        };


The current implementation of hasRealBacklink is highly inefficient as it performs a full vault scan (this.walkMd) for every AI-Output draft found. In a vault with $N$ total files and $M$ AI-Output drafts, this results in $O(M \times N)$ file read operations. For large vaults, this will cause significant performance degradation during the sweep operation.

It is recommended to build a set of all human-authored wikilinks by walking the vault once before iterating through the AI-Output entries.

// Pre-compute all human-authored links to avoid O(M*N) vault scans const humanLinks = new Set<string>(); this.walkMd((relPath, content) => { if (aiOutputPaths.has(relPath)) return; for (const l of this.parseWikilinks(content)) { const link = l.link.split("#")[0]; if (link) humanLinks.add(link); } }); const hasRealBacklink = (targetRel: string): boolean => { const targetBase = basename(targetRel, ".md"); const targetNoExt = targetRel.replace(/\.md$/, ""); return humanLinks.has(targetRel) || humanLinks.has(targetBase) || humanLinks.has(targetNoExt); };

gemini-code-assist · 2026-04-20T20:03:25Z

+        if (typeof body !== "string") throw err(-32602, "body required");
+
+        // Sanitize parent-query: truncate to 200 chars, replace " with right-double-quote
+        const parentQuery = parentQueryRaw.slice(0, 200).replace(/"/g, "\u201D");


The parentQuery string is embedded directly into the YAML frontmatter. If the user's query contains newlines, it will break the simple line-based frontmatter parser implemented in parseFrontmatter (which splits by \n and expects key: value pairs). Sanitizing newlines by replacing them with spaces ensures the generated file remains valid and parseable.

Suggested change

const parentQuery = parentQueryRaw.slice(0, 200).replace(/"/g, "\u201D");

const parentQuery = parentQueryRaw.slice(0, 200).replace(/"/g, "\u201D").replace(/\n/g, " ");

gemini-code-assist · 2026-04-20T20:03:25Z

+        } else {
+          yamlLines.push(`source-nodes:`);
+          for (const node of sourceNodes) {
+            const escaped = String(node).replace(/"/g, "\u201D");


Similar to parentQuery, items in sourceNodes should be sanitized for newlines to prevent breaking the YAML frontmatter structure during serialization, especially since these values are provided by the LLM and might contain unexpected formatting.

Suggested change

const escaped = String(node).replace(/"/g, "\u201D");

const escaped = String(node).replace(/"/g, "\u201D").replace(/\n/g, " ");

…dless MCP) Consolidates PRs #4 → #5 → #6 → #7 → #8 into one commit. ## Headline - 7 vault-* persona skills (architect/curator/gardener/historian/janitor/librarian/teacher) - AI-Output sediment pipeline (writeAIOutput + sweepAIOutput + review-status + scope + quarantine-state + history audit trail) - Step 2.5 input gate with warning emission - Step 2.6-2.8: tag migration + axis sub-key + sweep metrics trend log - Bilingual user guide (EN + CN) - Auto-generated tools reference with drift guard - End-to-end stdio smoke test - Paste-install UX (setup / setup.ps1) - Graph viewer (static HTML) - Brand: LLM Wiki Bridge (display) / obsidian-llm-wiki (slug) ## Merge-prep (c8651f5) - loadConfig precedence flipped to env > ./yaml > ../yaml (fixes silent vault redirect) - pglite vector.tar.gz bundle path fix via esbuild externalize (5ee746a) - compiler ruff clean (unblocks lint-python CI) - docs/ICEBOX.md: 2026-04-20 persona+MCP audit 12 items deferred to v3 - CHANGELOG.md v2.0.0 entry 121/121 tests green.

2233admin added 4 commits April 21, 2026 03:49

gemini-code-assist Bot reviewed Apr 20, 2026

View reviewed changes

2233admin merged commit 32f22a3 into v2-staging Apr 21, 2026

2233admin deleted the feat/ai-output-sediment branch April 21, 2026 08:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Step 1: AI-Output sediment system (writeback + stale sweep) + bilingual user guide#5

Step 1: AI-Output sediment system (writeback + stale sweep) + bilingual user guide#5
2233admin merged 4 commits intov2-stagingfrom
feat/ai-output-sediment

2233admin commented Apr 20, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 20, 2026

Uh oh!

gemini-code-assist Bot Apr 20, 2026

Uh oh!

gemini-code-assist Bot Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	const parentQuery = parentQueryRaw.slice(0, 200).replace(/"/g, "\u201D");
	const parentQuery = parentQueryRaw.slice(0, 200).replace(/"/g, "\u201D").replace(/\n/g, " ");

	const escaped = String(node).replace(/"/g, "\u201D");
	const escaped = String(node).replace(/"/g, "\u201D").replace(/\n/g, " ");

Conversation

2233admin commented Apr 20, 2026

What this adds

Two new MCP ops

vault.writeAIOutput

vault.sweepAIOutput

Design rationale (condensed)

Install + try it

Verification

Incidental fixes included

Files modified

Explicitly deferred

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

`vault.writeAIOutput`

`vault.sweepAIOutput`