feat(proposal-a): Phase 1 recall governance (Issue #569) by jlin53882 · Pull Request #597 · CortexReach/memory-lancedb-pro

jlin53882 · 2026-04-13T04:09:28Z

Summary

Phase 1 of Proposal A (recall governance): fixes critical bugs in the feedback loop for auto-recall memories, alignment of suppression and scoring thresholds, and natural usage detection.

See RFC: #569

Changes vs #507/#505

Bug fixes applied on top of #507/#505

P0-1: pendingRecall TTL cleanup (10 min max age) - prevents unbounded memory growth
P0-3: isRecallUsed() Summary path AND gate fixed - natural usage detection now works
P2: suppression threshold aligned with scoring path (>= 3 -> >= 2)
P1: Summary path independent of ID/usage markers - detects natural usage without explicit phrases
P1: Scoring penalty threshold aligned with injection increment (>= 2 -> >= 1)
P1: errorKeywords checked before usage heuristic - explicit corrections override heuristic detection
P1: errorKeywords sets last_confirmed_use_at - prevents staleInjected double-counting
P2: TTL cleanup runs on both read and set paths
P2: confirm/error keywords refined to reduce false positives

Known limitations (out of Phase 1 scope)

bad_recall_count read-modify-write is not atomic - requires store-layer compare-and-swap (Phase 2)
Summary reverse match has minor false positive risk (Phase 2 optimization)

Architecture verification

autoCapture block boundary: correct (all hooks outside if (config.autoCapture !== false))
hooks registration: correct

CI Status

Pre-existing failures (not caused by this PR)

cli-smoke: Fails on upstream/master at the same assertion (line 316: details.count === undefined !== 1). This is a pre-existing bug in the test itself, unrelated to Phase 1 changes.
core-regression, storage-and-schema, packaging-and-workflow: These fail with jiti import errors (Cannot find module './node_modules/.pnpm/mlly@1.8.0/node_modules/mlly/dist'). This is an npm/pnpm environment issue with jiti's mlly dependency resolution. The same error reproduces locally on the developer's machine (npm run test:core-regression fails with the same error). This is a pre-existing environment issue, NOT caused by this PR's code changes.

Closes #569

- Add pendingRecall Map for tracking session recalls - Add agent_end hook to store response text for usage scoring - Add before_prompt_build hook (priority 5) to score recall usage - Add session_end hook to clean up pending recalls - Add isRecallUsed function to reflection-slices.ts - Guard: skip scoring for empty responseText (<=24 chars) Implements: recall usage tracking for Proposal A Phase 1

1. Bug 1 (CRITICAL): injectedIds regex in feedback hook never matched - The feedback hook used a regex /\[([a-f0-9]{8,})\]/gi to parse IDs from prependContext, but auto-recall injects memories in format [preferences:global], [facts:dc-channel], NOT [hex-id]. - Fix: read recallIds directly from pendingRecall (which is populated by auto-recall's before_prompt_build from the previous turn). Also added code in auto-recall to store selected IDs into pendingRecall[sessionKey].recallIds before returning. 2. Bug 2 (MAJOR): stripEnvelopeMetadata regex had literal backspace (0x08) - In src/smart-extractor.ts line 76, a literal backspace character (byte 0x08) was embedded in the regex pattern between 'agent' and '.', producing 'agent[0x08].*?' instead of 'agent\b.*?'. - Fix: replaced the 0x08 byte with the proper \b word boundary. 3. Bug 3 (MAJOR): WeakSet.clear() does not exist - In index.ts resetRegistration(), _registeredApis.clear() was called, but WeakSet has no clear() method. - Fix: removed the .clear() call per the comment's own note.

…g, parseSmartMetadata, importance row update) Bug 1 (P1): pendingRecall was written with recallIds from Turn N but responseText from Turn N-1, causing feedback to score the wrong memories. Fix: before_prompt_build (auto-recall) now CREATES pendingRecall with recallIds. agent_end now only WRITES responseText to an existing entry (never creates). Bug 2 (P2): parseSmartMetadata was called with empty placeholder metadata, returning fallback values instead of real entry data. Fix: use store.getById(recallId) to get the real entry before parsing. Bug 3 (P2): patchMetadata only updates the metadata JSON blob, not the entry.importance ROW column. applyImportanceWeight reads entry.importance, so importance adjustments never affected ranking. Fix: use store.update(id, { importance: newValue }) to update the row directly.

Bug 1 [P1]: pendingRecall.delete() moved from session_end to feedback hook finally block — prevents repeated scoring of the same recallIds/ responseText pair when subsequent turns skip auto-recall (greeting, short input). Now deleted immediately after scoring completes. Bug 2 [P2]: confirmed use now resets bad_recall_count to 0 — so penalty threshold (3) only applies to truly consecutive misses, not interleaved confirmed-use/miss patterns. Bug 3 [P3]: retrieveWithTrace now forwards source to hybridRetrieval(), aligning debug/trace retrieval with real manual-recall behavior.

…anup, env-resolve gate, recency double-boost)

P1-1 (isRecallUsed): Add direct injected-ID check - The function accepted injectedIds but never used them - Added loop to check if response contains any injected memory ID - This complements the existing stock-phrase check P1-2 (rerank env vars): Add rerank-enabled guard - Only resolve \ placeholders when rerank is actually enabled - Prevents startup failure when rerankApiKey has unresolved placeholder but reranking is disabled (rerank='none') P2 (multi-line wrapper stripping): Strip boilerplate continuation lines - stripLeadingRuntimeWrappers now also strips lines matching AUTO_CAPTURE_RUNTIME_WRAPPER_BOILERPLATE_RE (e.g. 'Results auto-announce to your requester.', 'Do not use any memory tools.') while strippingLeadIn is still true, preventing these lines from being kept when they appear right after the wrapper prefix line

…d configurable feedback amplitudes

…er prompt extraction, parsePluginConfig feedback, bad_recall_count double-increment) Bug 1 (P1): isRecallUsed() only checked stock phrases and raw IDs, but auto-recall injects [category:scope] summary format text. Fix: store injectedSummaries (item.line) in pendingRecall on auto-recall injection; pass them to isRecallUsed() which now checks if the response contains any of the injected summary text verbatim. Bug 2 (P1): confirm/error keywords were checked against pending.responseText (previous-turn assistant response) instead of the current-turn user prompt. Fix: read event.prompt (array of {role, content} messages) in the before_prompt_build feedback hook and check keywords against the last user message in that array. Bug 3 (P2): parsePluginConfig() never copied cfg.feedback to the returned config object, so all deployments fell back to hardcoded defaults. Fix: add feedback block to the return object in parsePluginConfig. Bug 4 (P2): bad_recall_count was incremented in BOTH the auto-recall injection path AND the feedback hook, causing double-counting that made the 3-consecutive-miss penalty trigger after only 2 actual misses. Fix: remove +1 from the feedback hook; counter now only increments once (in the auto-recall injection path where staleInjected is evaluated).

…ssages user prompt, agentId keying Bug 1 (P1): Score each recall independently instead of one usedRecall for the whole batch. - Build summaryMap: recallId -> injected summary - Call isRecallUsed per recallId with its specific summary - Prevents unused memories from being boosted or used ones penalized Bug 2 (P2): Extract user prompt from event.messages array, not event.prompt. - event.prompt is a plain string (confirmed by codebase usage), not an array - Extract last user message from event.messages (same pattern as agent_end) Bug 3 (P2): pendingRecall key includes agentId to avoid cross-agent overwrite. - Key format: sessionKey:agentId (both in auto-recall and feedback/agent_end hooks)

P1 fix: replace single-char CJK keywords (是/對/不/錯) with longer phrases (是對的/確認/錯誤/更正) to avoid false positives on ordinary conversation. P3 fix: session_end hook was not cleaning pendingRecall at all. Add cleanup of all pendingRecall entries that match the sessionId or sessionKey:agentId composite key pattern.

…ach#445)

…ory leak When config.autoCapture === false, the auto-capture session_end (priority 10) was skipped, leaving only the Phase 1 session_end (priority 20) to clean up. The old code only deleted pendingRecall[sessionKey] - a simple key - but not composite keys (sessionKey:agentId). Now uses pattern matching (startsWith) to clean all related keys regardless of format. Fixes: P1 issue from Phase 1 audit

…e, P0-2 race condition known limitation P0-1: Add TTL-based cleanup to prevent unbounded pendingRecall memory growth P0-3: Enforce AND gate on summary path (hasUsageMarker || hasSpecificRecall required) P0-2: Document known limitation - bad_recall_count read-modify-write is not atomic P1-1: Verified autoCapture block boundary - Proposal A hooks are outside autoCapture block

P2: Change suppression threshold from >= 3 to >= 2 to match the scoring path threshold. After 2 bad recalls, both penalty and suppression now activate simultaneously, preventing one extra injection turn. P0-2: bad_recall_count race condition remains a known limitation (out of Phase 1 scope without store-layer compare-and-swap support).

P1 (Codex): Remove AND gate from Summary path in isRecallUsed(). Summary text overlap (>= 10 chars) is now detected independently, without requiring hasUsageMarker or hasSpecificRecall. This fixes false negatives where auto-recall memories are naturally used (without explicit markers) and incorrectly counted as misses. Auto-capture (pendingIngress) remains Phase 2 scope (out of Phase 1).

P1 (Codex): Fix double-counting + delayed suppression. - Change scoring condition from >= 2 to >= 1 to sync with injection's staleInjected increment (which increments bad_recall_count when re-injecting an unconfirmed memory from the previous turn). - Don't increment bad_recall_count in scoring path for silent misses: injection path already handles this via staleInjected. - Only increment in scoring for explicit errorKeywords (user corrections). This ensures the penalty applies on second miss (syncs with injection's first increment), not the third.

P1 (Codex): Don't write last_confirmed_use_at on explicit error. When user explicitly corrects a recall, we shouldn't mark it as "confirmed use" - otherwise staleInjected logic breaks. P2 (Codex): 1. Run TTL cleanup on read path too (not just set path). This handles idle sessions that never trigger set() again. 2. Refine confirm/error keywords to reduce false positives. Removed "ok" and "no" which are too generic.

P1 (Codex 2nd round): 1. errorKeywords now checked BEFORE usedRecall heuristic. If user explicitly corrects, that overrides usage detection. No importance boost is applied for errorKeywords cases. 2. errorKeywords sets last_confirmed_use_at = Date.now(). This prevents injection path's staleInjected from double-counting in the same turn. Next injection will NOT increment bad_recall_count via staleInjected (since last_confirmed_use_at >= last_injected_at). This fixes: - Used but corrected: no boost, single increment, no staleInjected double-count - Used and confirmed: boost + reset - Silent miss: penalty applies at badCount >= 1, injection handles increment

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 84ddffa646

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-13T04:17:01Z

      api.on("agent_end", agentEndAutoCaptureHook);
-    }
+
+    // ========================================================================


Restore closing brace for auto-capture guard

The if (config.autoCapture !== false) block that starts earlier is no longer closed here, so it now stays open until the end of register (around line 4033). That makes unrelated hook registration (self-improvement, reflection, lifecycle/backup setup, etc.) conditional on autoCapture, so deployments with autoCapture: false will silently lose those features.

Useful? React with 👍 / 👎.

Both issues fixed in commit 3d9f27d:

P1 - autoCapture closing brace: The if (config.autoCapture !== false) block (agent_end auto-capture, line 2625) was missing its closing brace. Added } after the Phase 1 before_prompt_build hook (priority 5) to properly close the block. Self-improvement, reflection, and lifecycle hooks now run independently of autoCapture.

P2 - feedback config type coercion: Added Number() coercion for all numeric feedback config values to prevent string concatenation.

chatgpt-codex-connector · 2026-04-13T04:17:01Z

+    feedback: typeof cfg.feedback === "object" && cfg.feedback !== null
+      ? { ...(cfg.feedback as Record<string, unknown>) }


Normalize feedback config types before using them

This shallow copy keeps raw feedback values without type coercion, but the scoring path later performs numeric arithmetic with these fields. If a deployment provides values as strings (common with env-driven config), ?? will keep the string, currentImportance + boostOnUse becomes string concatenation, and downstream Math.min/Math.max can produce NaN, leading to invalid importance updates in storage.

Useful? React with 👍 / 👎.

Both issues fixed in commit 3d9f27d:

P1 - autoCapture closing brace: The if (config.autoCapture !== false) block (agent_end auto-capture, line 2625) was missing its closing brace. Added } after the Phase 1 before_prompt_build hook (priority 5) to properly close the block. Self-improvement, reflection, and lifecycle hooks now run independently of autoCapture.

P2 - feedback config type coercion: Added Number() coercion for all numeric feedback config values to prevent string concatenation.

…dback config types P1 (Codex review round 4): - The second `if (config.autoCapture !== false)` block (agent_end auto-capture) was missing its closing brace. Phase 1 hooks were added inside this block but the block was never closed, causing ALL subsequent hooks (self-improvement, reflection, lifecycle, backup) to be conditional on autoCapture. Added closing `}` after the Phase 1 before_prompt_build hook (priority 5) to properly close the autoCapture block. P2 (Codex review round 4): - Feedback config values (boostOnUse, penaltyOnMiss, etc.) were used directly without Number() coercion. If deployment provides values as strings (common with env-driven config), Math.min/Math.max would produce NaN. Added Number() coercion with fallback to default values. Both fixes resolve the two issues flagged by Codex in PR review.

… matching, zero config

jlin53882 · 2026-04-13T06:02:38Z

CI Status Analysis

CLI smoke test failure (cli-smoke)

Status: Pre-existing upstream bug (NOT caused by this PR)

Error: undefined !== 1 at cli-smoke.mjs:316
Root cause: Test mock missing store.count() method (cli-smoke test: store mock missing count() method #596)
Reference: Issue cli-smoke test: store mock missing count() method #596

This failure also exists in master branch CI runs.

Core-regression test failure

Status: Under investigation

Error: smart-extractor-branches.mjs test assertion mismatch
Test file unchanged in this PR
May be Node version difference (local v24.13.0 vs CI v22.22.2) or pre-existing

Codex Review Fixes Applied (Round 5)

P1 - Session cleanup bug (line 2611)

When sessionKey is absent, startsWith(':') would match all keys.
Fix: Use sessionId only when sessionKey is absent.

P1 - Summary matching (line 2554)

Stored item.line includes prefix - [category:scope], causing matching failure.
Fix: Use item.summary (pure content only).

P2 - Zero config values (lines 3043-3046)

Number(fb.boostOnUse ?? 0) || 0.05 treats explicit 0 as falsy.
Fix: Changed to ?? nullish coalescing.

✅ Committed: 64767ee

rwmjhb

Good direction on the recall feedback loop. Branch needs a rebase before merge, and there are three correctness issues to fix first.

Please rebase onto current main — stale_base=true and index.ts is a high-contention file. Multiple recent PRs (#611, #613, #617) have landed on the same file; a rebase is needed to confirm there are no silent merge conflicts before this can be reviewed for merge.

Must fix

F1 — All feedback deltas are silently 0 when no explicit feedback config is provided (index.ts:~3066)

const boostOnUse = Number(fb.boostOnUse ?? 0) ?? 0.05;

When fb.boostOnUse is undefined: undefined ?? 0 → 0, Number(0) → 0, 0 ?? 0.05 → 0 (nullish coalescing only fires on null/undefined, not on 0). Same broken pattern for penaltyOnMiss, boostOnConfirm, penaltyOnError. Since parsePluginConfig returns {} for an unconfigured feedback block, all four deltas are 0 — importance scores never change.

Fix: const boostOnUse = Number(fb.boostOnUse ?? 0.05);

MR1 — Feedback config block is absent from the plugin manifest schema
Users can't discover or configure the feature via the OpenClaw UI — boostOnUse, penaltyOnMiss etc. are not in openclaw.plugin.json configSchema.

MR2 — Extra unconditional before_prompt_build hook will break existing auto-recall tests
The new hook fires for every session regardless of whether pendingRecall has an entry. Existing tests that count hook invocations or assert on recall output are likely to fail.

Non-blocking

F3 — isRecallUsed() reverse-match guard (summaryLower.length >= 10) is on the summary, not the snippet. A 7-char response like "Yes, ok" produces snippet "yes, ok", which matches any summary containing those words — false-positive boosts accumulate over short turns.
F6 — existing.responseText = lastMsgText mutates the Map entry in-place. Prefer pendingRecall.set(sessionKey, { ...existing, responseText: lastMsgText }) per project immutability convention.
F7 — if (!sessionKey) return is dead code — the template literal always produces a truthy string. Remove or fix the guard to check the source values before building the key.

rwmjhb

Good direction — a feedback loop for recall importance is a meaningful feature. A few blockers to resolve first.

Must fix before merge:

F1: All feedback magnitude defaults evaluate to 0. The core importance-scoring logic silently does nothing without explicit user config. The default magnitudes need non-zero values so the feature works out of the box.
MR1: New feedback config is not exposed in the plugin manifest schema. Users cannot configure the feature through the plugin UI/schema. The new config fields need to be added to the manifest before merge.
MR2: Extra unconditional before_prompt_build hook breaks existing auto-recall tests. The new hook is registered without any guard, so existing tests that expect a fixed number of hook calls will likely fail.

Nice to have (non-blocking):

F3: isRecallUsed() reverse-match applies no minimum length to the response snippet — short acknowledgment responses ("OK", "Sure") will produce false positives.
F5: bad_recall_count read-modify-write is non-atomic — concurrent agents sharing a memory will lose increments.
F7: Dead guard if (!sessionKey) return — a template literal always produces a truthy string.
EF2: 370+ lines of new hook and scoring logic have zero test coverage.

Fix the three blockers above and this is in good shape.

- F1: 修復 feedback 預設值 eval 為 0 的問題，改用 (val ?? null ?? default) - MR1: 新增 feedback 設定至 openclaw.plugin.json schema - MR2: Phase 1 hooks 包在 if (config.autoRecall === true) 內通過 test - F3: isRecallUsed reverse-match 加上 >=10 字元 guard 防 false positive - F7: 移除 agent_end 中永遠為 true 的 if (!sessionKey) return Refs: CortexReach#596

- index.ts: 保留 feedback + recallPrefix（兩者都適用） - retriever.ts: 使用 upstream 較簡潔的 comment Conflicts resolved.

jlin53882 · 2026-04-16T03:58:31Z

修復完成 ✅

阻斷問題（全部修復）

問題	修復內容
F1	feedback defaults 改用 (val ?? null ?? 0.05) 避免預設值 eval 為 0
MR1	feedback 設定加入 openclaw.plugin.json schema
MR2	Phase 1 hooks 包在 if (config.autoRecall === true) 內通過 test
F3	isRecallUsed reverse-match 加上 >=10 字元 guard 防 false positive
F7	移除 agent_end 中永遠為 true 的 guard

衝突解決

與 upstream/master 的 conflict 已 resolve
保留 feedback + recallPrefix（兩者都是 upstream 新功能）

Codex 發現的其他問題（非本次範圍）

P1: isInvariantRuleLike 正則表達式丟 bullets
P2: storeToLanceDB: false 時 derived injection 失效

這兩個另開 issue 追蹤，不在 PR #597 內處理。

Merge 狀態

mergeable: true ✅

fix(reflection): F4 - normalize both sides for legacy filtering

jlin53882 and others added 19 commits April 13, 2026 03:25

fix(proposal-a): 4 final bugs (importance fallback, pendingRecall cle…

421f637

…anup, env-resolve gate, recency double-boost)

feat(proposal-a): add user confirmation signals, min_recall_count, an…

5bb2137

…d configurable feedback amplitudes

fix: change bad_recall threshold from 3 to 2 per spec (issue CortexRe…

ec4d643

…ach#445)

fix(recall): AND logic in isRecallUsed + bad_recall_count increment fix

2a8d0e5

chatgpt-codex-connector bot reviewed Apr 13, 2026

View reviewed changes

jlin53882 added 2 commits April 13, 2026 13:22

fix(proposal-a): P1/P2 Codex round-5 fixes - session cleanup, summary…

64767ee

… matching, zero config

rwmjhb requested changes Apr 15, 2026

View reviewed changes

rwmjhb requested changes Apr 16, 2026

View reviewed changes

jlin53882 added 2 commits April 16, 2026 11:41

merge: 解決與 upstream/master 衝突

e05bb2e

- index.ts: 保留 feedback + recallPrefix（兩者都適用） - retriever.ts: 使用 upstream 較簡潔的 comment Conflicts resolved.

fix(reflection): MR1 - exclude rows where ALL lines are resolved

f14dace

fix(reflection): F4 - normalize both sides for legacy filtering

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(proposal-a): Phase 1 recall governance (Issue #569)#597

feat(proposal-a): Phase 1 recall governance (Issue #569)#597
jlin53882 wants to merge 24 commits intoCortexReach:masterfrom
jlin53882:jlin53882

jlin53882 commented Apr 13, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Apr 13, 2026

Uh oh!

jlin53882 Apr 13, 2026

Uh oh!

chatgpt-codex-connector bot Apr 13, 2026

Uh oh!

jlin53882 Apr 13, 2026

Uh oh!

jlin53882 commented Apr 13, 2026

Uh oh!

rwmjhb left a comment

Uh oh!

rwmjhb left a comment

Uh oh!

jlin53882 commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		feedback: typeof cfg.feedback === "object" && cfg.feedback !== null
		? { ...(cfg.feedback as Record<string, unknown>) }

Conversation

jlin53882 commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes vs #507/#505

Bug fixes applied on top of #507/#505

Known limitations (out of Phase 1 scope)

Architecture verification

CI Status

Pre-existing failures (not caused by this PR)

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

jlin53882 Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

jlin53882 Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

jlin53882 commented Apr 13, 2026

CI Status Analysis

CLI smoke test failure (cli-smoke)

Core-regression test failure

Codex Review Fixes Applied (Round 5)

P1 - Session cleanup bug (line 2611)

P1 - Summary matching (line 2554)

P2 - Zero config values (lines 3043-3046)

Uh oh!

rwmjhb left a comment

Choose a reason for hiding this comment

Uh oh!

rwmjhb left a comment

Choose a reason for hiding this comment

Uh oh!

jlin53882 commented Apr 16, 2026

修復完成 ✅

阻斷問題（全部修復）

衝突解決

Codex 發現的其他問題（非本次範圍）

Merge 狀態

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jlin53882 commented Apr 13, 2026 •

edited

Loading