Skip to content

fix(sanitizer): suppress false positives for memory retrieval (Issue #2025)#2053

Merged
bug-ops merged 2 commits intomainfrom
fix-2025-sanitizer
Mar 20, 2026
Merged

fix(sanitizer): suppress false positives for memory retrieval (Issue #2025)#2053
bug-ops merged 2 commits intomainfrom
fix-2025-sanitizer

Conversation

@bug-ops
Copy link
Owner

@bug-ops bug-ops commented Mar 20, 2026

Summary

Fixes false injection warnings when retrieving legitimate user queries from memory. The ContentSanitizer was incorrectly flagging prior conversation turns as injection attempts because it couldn't distinguish between:

  • Actual injection: untrusted external content (web scrapes, MCP output, documents)
  • False positives: prior user conversation turns from SQLite

Root Cause

assembly.rs::sanitize_memory_message() applied uniform injection detection to all 6 memory retrieval paths using ExternalUntrusted sensitivity. This caused patterns like system_prompt and reveal_instructions to fire on legitimate user queries.

Solution

Introduced MemorySourceHint enum to distinguish memory sources:

Hint Applied to Injection detection
ConversationHistory semantic recall, corrections Skipped (safe first-party source)
LlmSummary summaries, cross-session Skipped (safe agent-generated output)
ExternalContent document RAG, graph facts Full detection (threat surface)

All other pipeline steps (truncation, escaping, spotlighting) remain active for all sources — defense-in-depth is preserved.

Changes

  • crates/zeph-sanitizer/src/lib.rs: Added MemorySourceHint enum, extended ContentSource, modulated detect_injections()
  • crates/zeph-core/src/agent/context/assembly.rs: Threaded hints through all 6 memory retrieval call sites

Merge Conditions (All Addressed)

✓ Audit trail: tracing::debug! logs when injection detection is skipped
✓ Quarantine interaction: Test 9 verifies memory_retrieval is not in default quarantine sources
✓ False-positive strings: Tests 1 & 3 use exact Issue #2025 triggering strings ("system prompt", "show your instructions")

Testing

  • 6054 tests pass (10 new tests for this feature)
  • All validators passed:
    • Perf: Zero-cost abstraction, no regressions
    • Security: Defense-in-depth preserved, compile-time hint (cannot be spoofed)
    • Impl-critic: All architecture requirements met
    • Tester: Comprehensive coverage including edge cases
    • Reviewer: APPROVED FOR MERGE

Acceptance Criteria

  • No injection warnings for legitimate memory retrieval
  • All existing tests pass (6054 tests)
  • No regression in actual injection detection (ExternalContent still fully detected)
  • cargo clippy --all-targets --all-features --workspace -- -D warnings passes
  • Defense-in-depth preserved: truncation, escaping, spotlighting still active

Closes #2025

@github-actions github-actions bot added bug Something isn't working rust Rust code changes core zeph-core crate size/L Large PR (201-500 lines) and removed bug Something isn't working labels Mar 20, 2026
bug-ops added 2 commits March 20, 2026 16:31
…2025)

Introduces `MemorySourceHint` enum to distinguish memory retrieval sub-sources
and modulate injection detection sensitivity in `ContentSanitizer::sanitize`.

- ConversationHistory hint (recall, corrections): detection skipped — user's
  own prior messages legitimately contain "system prompt", "show instructions", etc.
- LlmSummary hint (summaries, cross_session): detection skipped — generated by
  the agent's own model from already-sanitized content, low poisoning risk.
- ExternalContent hint (doc_rag, graph_facts): full detection retained — may
  contain adversarial content from web scrapes or MCP responses stored in the corpus.

Defense-in-depth invariant: truncation, control-char stripping, delimiter escaping,
and spotlighting remain active for ALL memory sources regardless of hint.

Merge conditions addressed:
1. `tracing::debug!` logged when injection detection is skipped (audit trail).
2. Quarantine path verified: MemoryRetrieval is NOT in default quarantine sources
   (web_scrape, a2a_message), confirmed by test `quarantine_default_sources_exclude_memory_retrieval`.
3. Tests use actual false-positive-triggering strings ("system prompt", "show your
   instructions") and include a non-memory regression guard (WebScrape still detects).

Test count: 6041 → 6054 (+13 new unit tests in zeph-sanitizer).
@bug-ops bug-ops force-pushed the fix-2025-sanitizer branch from c31fa32 to e7c3bc5 Compare March 20, 2026 15:31
@bug-ops bug-ops enabled auto-merge (squash) March 20, 2026 15:31
@github-actions github-actions bot added the bug Something isn't working label Mar 20, 2026
@bug-ops bug-ops merged commit c53bc6c into main Mar 20, 2026
25 checks passed
@bug-ops bug-ops deleted the fix-2025-sanitizer branch March 20, 2026 15:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working core zeph-core crate rust Rust code changes size/L Large PR (201-500 lines)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix(sanitizer): injection false positives from legitimate user queries in memory retrieval

1 participant