-
Notifications
You must be signed in to change notification settings - Fork 2
Audit pack: repo-level prompt injection scanner #1
Description
Context
While building the adversarial input sanitizer for navi-bootstrap's rendering pipeline, we realized the defense tooling doubles as an offensive scanner: point it at any repo's committed files and it detects prompt injection attacks targeting AI coding assistants (Copilot, Claude Code, Cursor, etc.).
navi-os already has production-grade detection in src/navi/security/:
ContentScanner— 18 injection patterns, 14 self-replication patterns, 7 hidden instruction patterns, entropy analysis, multi-layer decodeUnicodeNormalizer— 4-pass pipeline: zero-width strip → fullwidth-to-ASCII → NFKC → homoglyph replacement (42 pairs)prompt_sanitizer— iterative decode (URL → HTML entities → hex → base64), pattern replacement- Full adversarial test suite + atheris fuzz corpus
Proposal
The audit pack (8th pack from the design doc, currently undesigned) becomes a repo-level prompt injection scanner:
- Scan committed files (README, CONTRIBUTING, issue templates, PR templates, code comments, docstrings, CI configs) for hostile patterns
- Detect: homoglyphs, zero-width chars, encoded payloads, template injection, prompt injection directives, self-replication patterns
- Output: structured report (findings, severity, location, risk score)
- Ships as an nboot pack —
nboot apply --pack auditscans the target repo
Attack surface
Every repo that an AI coding assistant reads is a prompt injection surface. Hostile content in committed files can:
- Override agent instructions via embedded directives
- Hide payloads in visually-identical homoglyph text
- Split detection keywords with zero-width characters
- Encode instructions in base64/HTML entities to evade pattern matching
- Embed self-replication patterns (Morris II style)
Status
Parking this as an issue. The sanitizer module (defensive side) is in progress. The audit pack (offensive/scanning side) is future work.
🤖 Generated with Claude Code