feat(mcp): Step 2.5 — input gate + sweep metrics#7
feat(mcp): Step 2.5 — input gate + sweep metrics#72233admin wants to merge 1 commit intofeat/step2-governancefrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces an 'Input gate' to filter low-signal AI outputs and adds detailed metrics to the sweep process. The input gate enforces quality by rejecting entries with short bodies, shell commands as queries, or missing anchors. The sweep metrics provide insights into persona distribution, status, and backlink hit rates. A performance bottleneck was identified in the metrics calculation where repeated vault scans could cause significant latency in large vaults; a pre-indexed backlink lookup was suggested to optimize this to linear complexity.
| let withRealBacklink = 0; | ||
| for (const e of entries) { | ||
| metrics.byPersona[e.persona] = (metrics.byPersona[e.persona] ?? 0) + 1; | ||
| metrics.byStatus[e.status || "(none)"] = (metrics.byStatus[e.status || "(none)"] ?? 0) + 1; | ||
| const qs = typeof e.fm["quarantine-state"] === "string" | ||
| ? (e.fm["quarantine-state"] as string) : "(none)"; | ||
| metrics.byQuarantineState[qs] = (metrics.byQuarantineState[qs] ?? 0) + 1; | ||
| if (hasRealBacklink(e.relPath)) withRealBacklink++; | ||
| } |
There was a problem hiding this comment.
The current implementation of metrics calculation has a significant performance bottleneck. Calling hasRealBacklink(e.relPath) inside a loop over all entries results in hasRealBacklink performs a full vault scan via walkMd and reads every file, this will lead to severe latency in larger vaults.
I suggest building a backlink index once by walking the vault a single time before the loop. This reduces the complexity to
| let withRealBacklink = 0; | |
| for (const e of entries) { | |
| metrics.byPersona[e.persona] = (metrics.byPersona[e.persona] ?? 0) + 1; | |
| metrics.byStatus[e.status || "(none)"] = (metrics.byStatus[e.status || "(none)"] ?? 0) + 1; | |
| const qs = typeof e.fm["quarantine-state"] === "string" | |
| ? (e.fm["quarantine-state"] as string) : "(none)"; | |
| metrics.byQuarantineState[qs] = (metrics.byQuarantineState[qs] ?? 0) + 1; | |
| if (hasRealBacklink(e.relPath)) withRealBacklink++; | |
| } | |
| const backlinked = new Set<string>(); | |
| this.walkMd((relPath, content) => { | |
| if (aiOutputPaths.has(relPath)) return; | |
| for (const l of this.parseWikilinks(content)) { | |
| const link = l.link.split("#")[0]; | |
| if (link) { | |
| backlinked.add(link); | |
| if (!link.endsWith(".md")) backlinked.add(link + ".md"); | |
| } | |
| } | |
| }); | |
| let withRealBacklink = 0; | |
| for (const e of entries) { | |
| metrics.byPersona[e.persona] = (metrics.byPersona[e.persona] ?? 0) + 1; | |
| metrics.byStatus[e.status || "(none)"] = (metrics.byStatus[e.status || "(none)"] ?? 0) + 1; | |
| const qs = (typeof e.fm["quarantine-state"] === "string" && e.fm["quarantine-state"]) || "(none)"; | |
| metrics.byQuarantineState[qs] = (metrics.byQuarantineState[qs] ?? 0) + 1; | |
| if (backlinked.has(e.relPath) || backlinked.has(basename(e.relPath, ".md"))) withRealBacklink++; | |
| } |
Informed by Eridanus117/agent-memory-runtime external review: borrow the default-reject rules (decision-table §七) and the promotion-ready metrics (governance-layer §P0.5), adapted to a vault-level base. Skipped: capture/observation layer and the durable-memory+injection path (wrong base — our recall is a librarian skill, not a Claude Code hook). Input gate in vault.writeAIOutput (after schema/enum validation, so those errors still surface first): - body >= 50 chars required - parent-query must not be a single shell command (pwd/ls/cd/cat/ rg/grep/echo/git status|diff) - at least one of parent-query or sourceNodes must be non-empty Sweep metrics on every vault.sweepAIOutput response: - totalEntries, byPersona, byStatus, byQuarantineState - realBacklinkHitRate (fraction of entries with a non-AI-Output wikilink anchoring them) Purpose: future threshold tuning needs evidence; a vault where <10% of entries are ever cited is a persona-prompt problem, not a staleness-threshold problem. Tests: 22/22 ai-output pass (6 new: 3 gate rejections, 1 gate allow, 2 metrics cases). Full mcp-server suite 110/110 green. Convention doc gains two new sections (input gate, sweep metrics) with rationale grounded in the external governance plan.
09d0af8 to
4374ce3
Compare
…w handoff Records two-PR (#6 governance + #7 Step 2.5 input gate/metrics) outcome, external-review absorption from agent-memory-runtime, and an explicit self-audited backlog for next session: P1: review-status cache has no sync code (drift lurking); history from/to axis is ambiguous (needs axis: sub-key) P2: review-status should be an Obsidian tag not a fm field (adapt-over-build rule violation); input gate thresholds are guesses (should downgrade to warnings); metrics is snapshot without trend log; no e2e MCP smoke test P3: MUST-append-history undocumented enforcement; borrow-3/skip-3 was inference not measurement; PR should have been single Next-session priority: migrate review-status frontmatter -> Obsidian tag (~-30 lines, kills two gaps at once), then downgrade input gate to warnings (~10 lines), then history axis sub-key (~20 lines), then metrics trend log (~15 lines). Total ~70 lines closes all P1 + half of P2.
…dless MCP) Consolidates PRs #4 → #5 → #6 → #7 → #8 into one commit. ## Headline - 7 vault-* persona skills (architect/curator/gardener/historian/janitor/librarian/teacher) - AI-Output sediment pipeline (writeAIOutput + sweepAIOutput + review-status + scope + quarantine-state + history audit trail) - Step 2.5 input gate with warning emission - Step 2.6-2.8: tag migration + axis sub-key + sweep metrics trend log - Bilingual user guide (EN + CN) - Auto-generated tools reference with drift guard - End-to-end stdio smoke test - Paste-install UX (setup / setup.ps1) - Graph viewer (static HTML) - Brand: LLM Wiki Bridge (display) / obsidian-llm-wiki (slug) ## Merge-prep (c8651f5) - loadConfig precedence flipped to env > ./yaml > ../yaml (fixes silent vault redirect) - pglite vector.tar.gz bundle path fix via esbuild externalize (5ee746a) - compiler ruff clean (unblocks lint-python CI) - docs/ICEBOX.md: 2026-04-20 persona+MCP audit 12 items deferred to v3 - CHANGELOG.md v2.0.0 entry 121/121 tests green.
Layers on top of PR #6. Implements the three items borrowed from the external
agent-memory-runtimegovernance review (governance-layer.md+decision-table.md):{totalEntries, byPersona, byStatus, byQuarantineState, realBacklinkHitRate}. Signals future threshold-tuning needs: a vault where <10% of entries are cited by human notes is a persona-prompt problem, not a staleness-threshold problem.docs/ai-output-convention.md.Skipped from the external review (and why)
Verification
npm run build: 0 tsc errorsStats
~180 insertions across 3 files (code + tests + docs). All existing short-body tests updated to use a shared
LONG_BODYconstant that clears the 50-char gate.Merge order
PR #6 (feat/step2-governance) first, then this. Both target merges back through
feat/ai-output-sediment→v2-staging→main.