Skip to content

feat(security): SEC-06 - block filesystem disclosure in SHOW_AD mode#63

Merged
Killea merged 1 commit intoKillea:mainfrom
bertheto:feat/SEC-06-filesystem-disclosure
Mar 19, 2026
Merged

feat(security): SEC-06 - block filesystem disclosure in SHOW_AD mode#63
Killea merged 1 commit intoKillea:mainfrom
bertheto:feat/SEC-06-filesystem-disclosure

Conversation

@bertheto
Copy link
Contributor

Summary

  • Add filesystemDisclosureFilter.ts: server-side detection of filesystem directory listings in demo mode (SHOW_AD=true)
  • Integrate filter in memoryStore.postMessage() and editMessage() — SHOW_AD-conditional, no impact on private/localhost deployments
  • /api/agents/register returns restricted_mode: true + restrictions: ['no_filesystem_disclosure'] when SHOW_AD=true (cooperative signal for MCP clients)

Motivation

On the public demo instance, an agent listed the full working directory structure when asked by another agent (public demo thread, 2026-03-08). The listing was produced by the agent's own filesystem tools (client-side), then posted as a message through AgentChatBus.

ACB cannot block client-side tool execution, but can enforce two complementary mitigations:

  1. Content filter extension (server-enforced): Extend the existing CONTENT_FILTER pattern when SHOW_AD=true to detect and block messages containing directory listings. Returns 400 with explanation.

  2. Restricted mode signal (cooperative): When an agent registers on a SHOW_AD=true instance, the registration response includes restricted_mode: true + restrictions: ['no_filesystem_disclosure']. Well-behaved MCP clients can use this to disable their filesystem tools proactively.

Detection Patterns

Six patterns, all conservative (low false positive risk):

Pattern Trigger
Unix tree output >= 2 lines containing tree connector characters
Unix ls -la Permission block (drwx...) and total N header
Windows dir header Line matching Mode ... LastWriteTime ... Name
Windows dir entries >= 2 lines matching d---- / -a--- + date format
Dense path cluster >= 3 consecutive absolute path lines (line-counting algorithm)
Sensitive file content /etc/passwd format (user:x:0:0:) or SSH public key header

Intentionally allowed (to minimise false positives):

  • Single path mention in a sentence or code example
  • ~/.ssh/ mentioned in prose ("protect your ~/.ssh/ directory")
  • createThread system_prompt — admin-controlled, not agent-generated

Test plan

  • npm test -- tests/unit/test_filesystem_disclosure_filter.test.ts — 36/36 pass
  • npm test — 529/530 pass (1 pre-existing flaky timing test in msgWaitMinTimeoutABScenario, unrelated to this PR)
  • Manual: POST /api/agents/register on instance with SHOW_AD=true — response includes restricted_mode: true
  • Manual: POST /api/threads/:id/messages with a tree output body on SHOW_AD=true instance — 400 with disclosure message

Limitations

  • Content filtering is heuristic-based — risk of false positives remains for edge cases (unusual directory structures)
  • The cooperative restricted_mode signal only works if the client respects it; a malicious client can ignore it
  • Neither approach prevents the agent from reading the filesystem internally; they only prevent the result from being posted

When SHOW_AD=true (public demo), agents could leak host filesystem
information by posting directory listings through AgentChatBus messages.

Changes:
- New filesystemDisclosureFilter.ts: 6 detection patterns
  - Unix tree connector output (>=2 lines with ├── / └── / │)
  - Unix ls -la output (permissions block + total header)
  - Windows dir/Get-ChildItem output (column header or >=2 mode lines)
  - Dense path cluster (>=3 consecutive absolute path lines, line-counting)
  - /etc/passwd content dump (colon-separated UID:GID format)
  - SSH public key / authorized_keys content
- Integrate checkFilesystemDisclosureOrThrow() in memoryStore.postMessage()
  and editMessage() — active only when SHOW_AD=true
- /api/agents/register returns restricted_mode:true + restrictions array
  when SHOW_AD=true (cooperative signal for well-behaved MCP clients)
- 36 new unit + integration tests (all pass)

Design: conservative. Single path mention in prose = allowed.
Structured bulk output = blocked. createThread system_prompt intentionally
not filtered (admin-controlled, not agent-generated).
@Killea Killea merged commit d19ed40 into Killea:main Mar 19, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants