Skip to content

Audit pack: repo-level prompt injection scanner #1

@Fieldnote-Echo

Description

Context

While building the adversarial input sanitizer for navi-bootstrap's rendering pipeline, we realized the defense tooling doubles as an offensive scanner: point it at any repo's committed files and it detects prompt injection attacks targeting AI coding assistants (Copilot, Claude Code, Cursor, etc.).

navi-os already has production-grade detection in src/navi/security/:

  • ContentScanner — 18 injection patterns, 14 self-replication patterns, 7 hidden instruction patterns, entropy analysis, multi-layer decode
  • UnicodeNormalizer — 4-pass pipeline: zero-width strip → fullwidth-to-ASCII → NFKC → homoglyph replacement (42 pairs)
  • prompt_sanitizer — iterative decode (URL → HTML entities → hex → base64), pattern replacement
  • Full adversarial test suite + atheris fuzz corpus

Proposal

The audit pack (8th pack from the design doc, currently undesigned) becomes a repo-level prompt injection scanner:

  • Scan committed files (README, CONTRIBUTING, issue templates, PR templates, code comments, docstrings, CI configs) for hostile patterns
  • Detect: homoglyphs, zero-width chars, encoded payloads, template injection, prompt injection directives, self-replication patterns
  • Output: structured report (findings, severity, location, risk score)
  • Ships as an nboot pack — nboot apply --pack audit scans the target repo

Attack surface

Every repo that an AI coding assistant reads is a prompt injection surface. Hostile content in committed files can:

  • Override agent instructions via embedded directives
  • Hide payloads in visually-identical homoglyph text
  • Split detection keywords with zero-width characters
  • Encode instructions in base64/HTML entities to evade pattern matching
  • Embed self-replication patterns (Morris II style)

Status

Parking this as an issue. The sanitizer module (defensive side) is in progress. The audit pack (offensive/scanning side) is future work.

🤖 Generated with Claude Code

Metadata

Metadata

Assignees

No one assigned

    Labels

    roadmapFuture work, not immediately actionable

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions