Skip to content

HMA cannot distinguish mcp/buggy/ibm-mcp-clone from mcp/benign/readonly-fs-mcp (2026-04-21 incident in repro form) #129

@thebenignhacker

Description

@thebenignhacker

Symptom

Two MCP fixtures with deliberately different intent labels produce identical HMA secure --json output:

Fixture Intent HMA score HMA finding count
mcp/benign/readonly-fs-mcp benign 98 51 (all noise floor)
mcp/buggy/ibm-mcp-clone buggy 98 51 (same set)

Same score, same 51 finding IDs, same severity histogram. HMA gives them indistinguishable verdicts.

This is the 2026-04-21 incident in repro form — when a community-published MCP that was structurally similar to a known-good Anthropic MCP scored identically, masking real risk signal.

Source

Captured locally during opena2a-corpus Phase 2 ground-truth pinning. Both fixtures live at ~/.opena2a/corpus/mcp/{benign,buggy}/<name>/. The fixture manifests at mcp/buggy/ibm-mcp-clone/manifest.yaml declare a buggy intent; HMA disagrees.

The corpus's expected.hma.score.{min,max} for both fixtures was pinned at [95, 100] per "honest, not aspirational" ground truth — the corpus correctly captures HMA's current behavior; the bug is in HMA.

Repro

git clone https://github.com/opena2a-org/opena2a-corpus.git ~/.opena2a/corpus
cd ~/path/to/hackmyagent
node dist/cli.js secure ~/.opena2a/corpus/mcp/benign/readonly-fs-mcp --json | jq '{score, finding_count: (.findings | map(select(.passed==false)) | length)}'
node dist/cli.js secure ~/.opena2a/corpus/mcp/buggy/ibm-mcp-clone --json | jq '{score, finding_count: (.findings | map(select(.passed==false)) | length)}'

Both should print {score: 98, finding_count: 51}.

What "buggy" looks like in this fixture

mcp/buggy/ibm-mcp-clone/ is a structural clone of an unverified-publisher MCP. manifest.yaml enumerates the distinguishing attributes that HMA should pick up on. Today it picks up none of them.

Why this matters

ai-trust's --deep mode (NanoMind semantic) DOES distinguish these (51 vs 14 vs 85 score) — see corpus' expected.aiTrust.score bands. So the semantic layer can tell them apart. The static layer (secure --json without --deep) cannot.

This is exactly the kind of inter-classifier drift the corpus exists to surface. Now it's surfaced; over to HMA to close the gap.

Suggested next step

Add a static check that targets the buggy-MCP signal in the manifest, validate it doesn't FP on benign, then re-pin the corpus's expected.hma.score.max for mcp/buggy/ibm-mcp-clone downward to enforce the new band.

Refs

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions