fix(sanitizer): suppress false positives for memory retrieval (Issue #2025) by bug-ops · Pull Request #2053 · bug-ops/zeph

bug-ops · 2026-03-20T15:31:33Z

Summary

Fixes false injection warnings when retrieving legitimate user queries from memory. The ContentSanitizer was incorrectly flagging prior conversation turns as injection attempts because it couldn't distinguish between:

Actual injection: untrusted external content (web scrapes, MCP output, documents)
False positives: prior user conversation turns from SQLite

Root Cause

assembly.rs::sanitize_memory_message() applied uniform injection detection to all 6 memory retrieval paths using ExternalUntrusted sensitivity. This caused patterns like system_prompt and reveal_instructions to fire on legitimate user queries.

Solution

Introduced MemorySourceHint enum to distinguish memory sources:

Hint	Applied to	Injection detection
`ConversationHistory`	semantic recall, corrections	Skipped (safe first-party source)
`LlmSummary`	summaries, cross-session	Skipped (safe agent-generated output)
`ExternalContent`	document RAG, graph facts	Full detection (threat surface)

All other pipeline steps (truncation, escaping, spotlighting) remain active for all sources — defense-in-depth is preserved.

Changes

crates/zeph-sanitizer/src/lib.rs: Added MemorySourceHint enum, extended ContentSource, modulated detect_injections()
crates/zeph-core/src/agent/context/assembly.rs: Threaded hints through all 6 memory retrieval call sites

Merge Conditions (All Addressed)

✓ Audit trail: tracing::debug! logs when injection detection is skipped
✓ Quarantine interaction: Test 9 verifies memory_retrieval is not in default quarantine sources
✓ False-positive strings: Tests 1 & 3 use exact Issue #2025 triggering strings ("system prompt", "show your instructions")

Testing

6054 tests pass (10 new tests for this feature)
All validators passed:
- Perf: Zero-cost abstraction, no regressions
- Security: Defense-in-depth preserved, compile-time hint (cannot be spoofed)
- Impl-critic: All architecture requirements met
- Tester: Comprehensive coverage including edge cases
- Reviewer: APPROVED FOR MERGE

Acceptance Criteria

No injection warnings for legitimate memory retrieval
All existing tests pass (6054 tests)
No regression in actual injection detection (ExternalContent still fully detected)
cargo clippy --all-targets --all-features --workspace -- -D warnings passes
Defense-in-depth preserved: truncation, escaping, spotlighting still active

Closes #2025

…2025) Introduces `MemorySourceHint` enum to distinguish memory retrieval sub-sources and modulate injection detection sensitivity in `ContentSanitizer::sanitize`. - ConversationHistory hint (recall, corrections): detection skipped — user's own prior messages legitimately contain "system prompt", "show instructions", etc. - LlmSummary hint (summaries, cross_session): detection skipped — generated by the agent's own model from already-sanitized content, low poisoning risk. - ExternalContent hint (doc_rag, graph_facts): full detection retained — may contain adversarial content from web scrapes or MCP responses stored in the corpus. Defense-in-depth invariant: truncation, control-char stripping, delimiter escaping, and spotlighting remain active for ALL memory sources regardless of hint. Merge conditions addressed: 1. `tracing::debug!` logged when injection detection is skipped (audit trail). 2. Quarantine path verified: MemoryRetrieval is NOT in default quarantine sources (web_scrape, a2a_message), confirmed by test `quarantine_default_sources_exclude_memory_retrieval`. 3. Tests use actual false-positive-triggering strings ("system prompt", "show your instructions") and include a non-memory regression guard (WebScrape still detects). Test count: 6041 → 6054 (+13 new unit tests in zeph-sanitizer).

github-actions bot added bug Something isn't working rust Rust code changes core zeph-core crate size/L Large PR (201-500 lines) and removed bug Something isn't working labels Mar 20, 2026

bug-ops added 2 commits March 20, 2026 16:31

docs(sanitizer): fix sanitize_memory_message hint doc comment

e7c3bc5

bug-ops force-pushed the fix-2025-sanitizer branch from c31fa32 to e7c3bc5 Compare March 20, 2026 15:31

bug-ops enabled auto-merge (squash) March 20, 2026 15:31

github-actions bot added the bug Something isn't working label Mar 20, 2026

bug-ops merged commit c53bc6c into main Mar 20, 2026
25 checks passed

bug-ops deleted the fix-2025-sanitizer branch March 20, 2026 15:39

bug-ops mentioned this pull request Mar 20, 2026

fix(sanitizer): memory_search tool output path not covered by MemorySourceHint fix (#2053) #2057

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(sanitizer): suppress false positives for memory retrieval (Issue #2025)#2053

fix(sanitizer): suppress false positives for memory retrieval (Issue #2025)#2053
bug-ops merged 2 commits intomainfrom
fix-2025-sanitizer

bug-ops commented Mar 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant