Skip to content

Step 1: AI-Output sediment system (writeback + stale sweep) + bilingual user guide#5

Merged
2233admin merged 4 commits intov2-stagingfrom
feat/ai-output-sediment
Apr 21, 2026
Merged

Step 1: AI-Output sediment system (writeback + stale sweep) + bilingual user guide#5
2233admin merged 4 commits intov2-stagingfrom
feat/ai-output-sediment

Conversation

@2233admin
Copy link
Copy Markdown
Owner

What this adds

The "persona outputs survive the session" design — Step 1 of the dependency chain ③ writeback → ① provenance → ② active-push. Without this, every /vault-architect / /vault-gardener / etc. analysis evaporates at session end; the vault only sediments the human's notes, not the collaboration history.

Two new MCP ops

vault.writeAIOutput

Persona calls this to persist a meaningful analysis. Lands at:

{vault}/00-Inbox/AI-Output/{persona}/YYYY-MM-DD-{slug}.md

With a 6-field frontmatter that a future sweep (or human) can reason about:

---
generated-by: vault-architect
generated-at: 2026-04-21T14:32:00.000Z
agent: claude-opus-4-7
parent-query: "refactor authentication module"
source-nodes:
  - "[[auth-architecture]]"
  - "[[session-tokens]]"
status: draft
---

Each field solves a distinct future-failure class: who / when / what-quality / why / from-where / still-valid. Every persona's skill file got a per-persona invocation template appended.

vault.sweepAIOutput

Gardener calls this to manage the sediment lifecycle. Two report types:

  • Stale candidates — drafts past per-persona age threshold (architect=45d, gardener=30d, historian=180d, others=60d) AND with zero backlinks from non-AI-Output files. AI-Output → AI-Output references are self-anchoring hallucination chains and do not count.
  • Supersede candidates — reviewed pairs with source-nodes Jaccard ≥ 0.6 and a newer generated-at. Reported only; never auto-applied.

dry_run: true (default) returns the report without writing. dry_run: false flips status: draftstale in-place via narrow regex substitution (body + other fm fields preserved).

Design rationale (condensed)

  • B → A lifecycle (gardener auto-stale + human manual-review) beats the reverse. Maintenance cost of "default trust + auto-stale" is O(误判 rate); cost of "default suspect + human promotes" is O(total volume). At 300 AI-Output files in 6 months you'll only promote ~5.
  • Backlink filter excludes AI-Output sources because two AI-Output files can happen to cite each other without either being grounded in anything a human cared about. Forces at least one "this was worth keeping" vote from a human-written note.
  • Supersede stays human-confirmed — auto-applying "this new one replaces that old one" would be AI grading its own successors, a second hallucination layer. Gardener proposes, human decides.
  • Per-persona subdirs for MVP simplicity; schema carries enough info to graduate to flat + yaml-lookup without code changes if per-content-type differentiation proves more important than per-persona later.

Install + try it

New comprehensive bilingual guide lands in this PR:

Minimum viable try after merge:

git clone --depth 1 https://github.com/2233admin/obsidian-llm-wiki.git ~/obsidian-llm-wiki-src
cd ~/obsidian-llm-wiki-src && git checkout feat/ai-output-sediment && ./setup

Then in Claude Code: /vault-architect suggest 3 refactors for my vault structure. Watch the file appear at 00-Inbox/AI-Output/vault-architect/YYYY-MM-DD-suggest-3-refactors.md.

Verification

  • npm run build — tsc 0 errors
  • node.exe --test dist/**/*.test.js98 tests pass, 0 fail (88 baseline + 10 new)
  • npm run generate-tools-doc — deterministic output; drift guard green

New tests cover the full contract:

Scenario Covered
dryRun=false writes all 6 frontmatter fields + status: draft
dryRun default returns plan without writing
Collision appends -2 suffix same day/slug
Invalid persona rejected with code -32602
Empty sourceNodes serializes as []
Stale detection uses injected now for determinism
Drafts with non-AI-Output backlinks are exempted
AI-Output → AI-Output backlinks do NOT anchor
dry_run: false flips status in-place, preserves body
Supersede candidates fire on Jaccard ≥ 0.6

Incidental fixes included

  • class VaultFs is now exported from mcp-server/src/index.ts (was module-private).
  • main() now guarded by import.meta.url === process.argv[1]. Without this, importing VaultFs from tests would spin up StdioServerTransport and hold stdin open, blocking all subsequent test files. One-liner, preserves runnable-as-entrypoint behavior.

Files modified

  • mcp-server/src/core/operations.ts — 2 new Operation entries (+27)
  • mcp-server/src/index.ts — 2 dispatch cases + export + main guard (+190)
  • mcp-server/src/ai-output.test.ts — NEW, 10 tests (+297)
  • skills/vault-{architect,curator,gardener,historian,janitor,librarian,teacher}.md — +17 each (gardener +23)
  • docs/ai-output-convention.md — NEW schema + lifecycle + FAQ (~120)
  • docs/mcp-tools-reference.md — auto-regenerated (+29)
  • docs/GUIDE.md + docs/GUIDE.zh-CN.md — NEW bilingual guide (268 each)
  • README.md — language-switch blockquote above description
  • progress.txt — Step 1 shipping summary + handoff

Explicitly deferred

  • Auto reviewed → superseded flip (currently reports only)
  • YAML-config stale policy (currently hardcoded)
  • user-verdict frontmatter field
  • Provenance enrichment of other MCP op responses (Step 2 in the dependency chain)

Ready for review or merge. Base is v2-staging so this layers on top of PR #4 — merge order: #4 first, then this.

Persona outputs now sediment into {vault}/00-Inbox/AI-Output/{persona}/
with a 6-field provenance frontmatter (generated-by / generated-at /
agent / parent-query / source-nodes / status). Gardener can
subsequently flip aged drafts to stale with a backlink-source test.

vault.writeAIOutput
- Typed params with persona regex ^vault-[a-z]+$ validation
- Auto-slug from parentQuery (first 6 words, kebab-case, fs-safe);
  YYYY-MM-DD-slug.md under per-persona subdir
- Collision loop appends -2 through -99
- YAML-subset serialization round-trips through the existing
  parseFrontmatter (no reformatting drift)
- dryRun default true (matches project convention); on dry run
  returns the computed path + frontmatter without writing
- Status hardcoded `draft` on write

vault.sweepAIOutput
- Hardcoded per-persona thresholds: architect=45d, gardener=30d,
  historian=180d, librarian=60d, catch-all=60d (MVP; graduate to
  yaml config in Step 1.5 once usage justifies)
- Stale rule = age-past-threshold AND zero non-AI-Output backlinks
  (source files whose own frontmatter carries `generated-by` do
  NOT anchor -- AI->AI references would be self-anchoring
  hallucination chains)
- Supersede candidates reported on same-persona reviewed pairs with
  source-nodes Jaccard >= 0.6 and newer generated-at; human
  confirms, never auto-applies -- AI self-grading is a second-layer
  hallucination deliberately avoided
- dry_run=false rewrites frontmatter `status: draft` to `stale`
  in-place via narrow regex substitution (body + other fm fields
  untouched)
- `now` param provides ISO timestamp injection seam for tests

Tests: 10 unit tests in mcp-server/src/ai-output.test.ts covering
the write contract (all 6 fields, dryRun default, collision,
persona validation, empty sourceNodes serialization) and the sweep
contract (threshold detection with injected now, backlink filter,
AI-Output->AI-Output non-anchoring, in-place status flip preserving
body, supersede candidate Jaccard gate). node:test + tmpdir
isolation, zero new deps.

Incidental fix: added an `import.meta.url === process.argv[1]` guard
around main() in index.ts so importing VaultFs from tests does not
start the stdio MCP server. Without the guard, StdioServerTransport
keeps the process alive and subsequent test files never run.

Also exported `class VaultFs` (was previously private to the module)
so ai-output.test.ts can construct a fresh instance per test against
a tmpdir vault.

Total: 98 tests pass (88 baseline + 10 new). tsc 0 errors.

See docs/ai-output-convention.md for the human-readable schema
contract + FAQ. Closes Step 1 of the AI-Output sediment system.
Every persona now instructs its runtime to persist meaningful
analyses via vault.writeAIOutput, so the vault sediments both the
human's notes and the AI collaboration history.

- skills/vault-{architect,curator,teacher,historian,janitor,
  librarian,gardener}.md: appended `## Sediment convention` block
  with a per-persona vault.writeAIOutput invocation template +
  status lifecycle pointer.
- skills/vault-gardener.md: additionally appended `## Sweep
  convention (gardener-only responsibility)` -- dry-run first,
  require explicit user confirmation before dry_run:false flips,
  never auto-apply supersede candidates.
- docs/ai-output-convention.md (NEW, ~120 lines): schema table,
  legal status transitions, per-persona thresholds, FAQ covering
  "who flips reviewed", "what counts as a backlink", "why exclude
  AI->AI references", "what happens to stale entries".

Non-trivial design reasoning (why B->A beats A->B for the
draft->stale/reviewed split, why the 6-field schema is minimal,
why supersede stays semi-automatic) lives in planning notes, not
in-tree.
Auto-generated via `npm run generate-tools-doc` after adding
vault.writeAIOutput + vault.sweepAIOutput. Drift guard test now
passes (mcp-server/src/scripts/generate-tools-doc.test.ts).

Purely mechanical diff: 2 new ops appear under the `vault.*`
namespace section with their param tables; no hand-edits.
…ess handoff

- docs/GUIDE.md (NEW, 268 lines): user-journey oriented walkthrough
  -- pitch, 30-second install, first useful 5-minute session with 5
  real prompt examples, 7-persona table, AI-Output sediment plain-
  English explanation, common prompts cheat sheet, troubleshooting,
  FAQ. Links to INSTALL.md for install-depth, ai-output-convention.md
  for sediment schema, mcp-tools-reference.md for tool catalog.
- docs/GUIDE.zh-CN.md (NEW, 268 lines): structure-mirrored Chinese
  translation. Idiomatic (not literal) rendering of the same content
  for the Chinese dev community. Top-of-page language switch link.
- README.md: single-line language-switch header pointing at both
  GUIDE files. Keeps README lean; GUIDE.md carries the depth.
- progress.txt: Step 1 (AI-Output sediment) shipping summary,
  engineering notes for next editor, updated known-gaps roster.
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the 'AI-Output sediment' feature, which allows persona-authored analyses to be persisted in the vault. It adds two new MCP tools: vault.writeAIOutput for saving analyses with structured frontmatter and vault.sweepAIOutput for managing the lifecycle of these files (staling old drafts and identifying supersede candidates). The PR also includes comprehensive documentation, unit tests, and updates to persona skill files to utilize the new writeback capability. Feedback focuses on optimizing the sweep operation's performance and ensuring YAML frontmatter integrity by sanitizing newlines in user-provided strings.

Comment thread mcp-server/src/index.ts
Comment on lines +848 to +865
const hasRealBacklink = (targetRel: string): boolean => {
const targetBase = basename(targetRel, ".md");
let found = false;
this.walkMd((relPath, content) => {
if (found) return;
if (relPath === targetRel) return;
if (aiOutputPaths.has(relPath)) return; // AI-Output -> AI-Output doesn't anchor
for (const l of this.parseWikilinks(content)) {
const linkPath = l.link.split("#")[0];
if (!linkPath) continue;
if (linkPath === targetRel || linkPath === targetBase || linkPath + ".md" === targetRel) {
found = true;
return;
}
}
});
return found;
};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current implementation of hasRealBacklink is highly inefficient as it performs a full vault scan (this.walkMd) for every AI-Output draft found. In a vault with $N$ total files and $M$ AI-Output drafts, this results in $O(M \times N)$ file read operations. For large vaults, this will cause significant performance degradation during the sweep operation.

It is recommended to build a set of all human-authored wikilinks by walking the vault once before iterating through the AI-Output entries.

        // Pre-compute all human-authored links to avoid O(M*N) vault scans
        const humanLinks = new Set<string>();
        this.walkMd((relPath, content) => {
          if (aiOutputPaths.has(relPath)) return;
          for (const l of this.parseWikilinks(content)) {
            const link = l.link.split("#")[0];
            if (link) humanLinks.add(link);
          }
        });

        const hasRealBacklink = (targetRel: string): boolean => {
          const targetBase = basename(targetRel, ".md");
          const targetNoExt = targetRel.replace(/\.md$/, "");
          return humanLinks.has(targetRel) || humanLinks.has(targetBase) || humanLinks.has(targetNoExt);
        };

Comment thread mcp-server/src/index.ts
if (typeof body !== "string") throw err(-32602, "body required");

// Sanitize parent-query: truncate to 200 chars, replace " with right-double-quote
const parentQuery = parentQueryRaw.slice(0, 200).replace(/"/g, "\u201D");
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The parentQuery string is embedded directly into the YAML frontmatter. If the user's query contains newlines, it will break the simple line-based frontmatter parser implemented in parseFrontmatter (which splits by \n and expects key: value pairs). Sanitizing newlines by replacing them with spaces ensures the generated file remains valid and parseable.

Suggested change
const parentQuery = parentQueryRaw.slice(0, 200).replace(/"/g, "\u201D");
const parentQuery = parentQueryRaw.slice(0, 200).replace(/"/g, "\u201D").replace(/\n/g, " ");

Comment thread mcp-server/src/index.ts
} else {
yamlLines.push(`source-nodes:`);
for (const node of sourceNodes) {
const escaped = String(node).replace(/"/g, "\u201D");
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to parentQuery, items in sourceNodes should be sanitized for newlines to prevent breaking the YAML frontmatter structure during serialization, especially since these values are provided by the LLM and might contain unexpected formatting.

Suggested change
const escaped = String(node).replace(/"/g, "\u201D");
const escaped = String(node).replace(/"/g, "\u201D").replace(/\n/g, " ");

@2233admin 2233admin merged commit 32f22a3 into v2-staging Apr 21, 2026
2233admin added a commit that referenced this pull request Apr 21, 2026
…dless MCP)

Consolidates PRs #4#5#6#7#8 into one commit.

## Headline

- 7 vault-* persona skills (architect/curator/gardener/historian/janitor/librarian/teacher)
- AI-Output sediment pipeline (writeAIOutput + sweepAIOutput + review-status + scope + quarantine-state + history audit trail)
- Step 2.5 input gate with warning emission
- Step 2.6-2.8: tag migration + axis sub-key + sweep metrics trend log
- Bilingual user guide (EN + CN)
- Auto-generated tools reference with drift guard
- End-to-end stdio smoke test
- Paste-install UX (setup / setup.ps1)
- Graph viewer (static HTML)
- Brand: LLM Wiki Bridge (display) / obsidian-llm-wiki (slug)

## Merge-prep (c8651f5)

- loadConfig precedence flipped to env > ./yaml > ../yaml (fixes silent vault redirect)
- pglite vector.tar.gz bundle path fix via esbuild externalize (5ee746a)
- compiler ruff clean (unblocks lint-python CI)
- docs/ICEBOX.md: 2026-04-20 persona+MCP audit 12 items deferred to v3
- CHANGELOG.md v2.0.0 entry

121/121 tests green.
@2233admin 2233admin deleted the feat/ai-output-sediment branch April 21, 2026 08:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant