fix(credential-analyzer): gate AST-CRED-001 on credential-format substring (#164)#167
Conversation
…tring (#164) `.claude/settings.json` was firing AST-CRED-001 MEDIUM on every project that uses Claude Code's hook protection. The upstream compiler tags any artifact whose lowercased content includes the substring "credential" or "session" with a `dataType: 'credentials'` access pattern, so a `.claude/settings.json` containing "Read(.aws/credentials)" deny-rules and `$VAR` placeholder env values was being flagged despite holding no credential value. Changes: - Content-format gate at `checkCredentialsInNonEnvContext`. The finding only fires when the artifact content contains a credential-format substring per a curated regex (vendor prefixes `sk-`, `sk_live_`, `ghp_`, `gho_`, `github_pat_`, `xox[abprs]-`, `eyJ` JWT, `AKIA…`, `AIza…`, plus a high-entropy 40+ word-character fallback). The regex is shared with AST-CRED-003's evidence-text gate. - File:line population from the credential-format match position, plus an `evidence` field that is **masked at the analyzer layer** so the raw token never reaches finding output, JSON, telemetry, or Registry sync. Mask preserves recognizable prefixes (`sk-ant-api03-` → `sk-ant-api03-****************`). - Governance-amplifier prose ("This artifact has no inline governance constraints, amplifying the risk of this finding") is now suppressed on a curated host-tool config allowlist (`.claude/settings.json`, `.vscode/settings.json`, `package.json`, `tsconfig.json`, `.eslintrc.json`, etc.) — NOT on every `.json`/`.yaml` file. MCP configs and A2A agent cards still get the amplifier text because they ARE agent-definition surfaces. - New tests b15 / b16 (FP suppressed on `.claude/settings.json` with `$VAR` and `${VAR}` placeholders) and b15-positive (real Slack token in `agent-config.yaml` with credentials keyword still fires AST-CRED-001 with line:4 + masked-prefix evidence). - Updated existing test in `source-code-preprocessor.test.ts` to pass `artifactContent` so AST-CRED-001 continues to fire on Go source with a real `sk-ant-api03-…` value. Phase 4.5 adversarial review (subagent) caught: - R-Bonus (HIGH): raw credential value was leaking into the `evidence` field — fixed by masking in `findFirstCredentialFormat`. - R4 (MEDIUM): governance-amplifier extension-match was too broad and was over-suppressing on `.cursor/mcp.json` / `agent.json` / `agent-config.yaml` — replaced with curated host-tool allowlist. - R2 (HIGH): malicious `.clinerules` in adversarial corpus no longer fires AST-CRED-001 (it used to fire on credential-keyword narrative alone). Accepted with rationale: the same fixture still fires AST-CRED-002 CRITICAL ("credential forwarding to external endpoint") and SEM-INST-001 / SEM-INST-002 HIGH on lines 1-3, so system-level detection of the malicious artifact is preserved. Score bands on the malicious corpus are unchanged (release-smoke 12/12). Verification: - `npm test` 2025/2051 pass (was 2022/2048; +3 new b15/b16/b15-positive). - `npm run release-smoke:corpus` 12/12. - HMA self-scan 84 → 89: classification (a) preserved-detection FP-suppress at the .claude/settings.json site; classification (b) narrowed-detection at the .clinerules site — but the file is still flagged CRITICAL via AST-CRED-002 + HIGH via SEM-INST-*. - `.cursorrules:7` AST-CRED-001 still fires on adversarial corpus with masked `sk-ant-api03-****************` evidence (regex unchanged in detection power; only the rendered evidence is masked). Out of scope (filed as follow-ups): - R1 / R7 regex gaps (Stripe `pk_live_`, Twilio `AC…`, Mailgun `key-`, Heroku UUID, short `sk-` < 20 chars, URL-safe base64 with `-`/`/`). These are pre-existing gaps in `evidenceShowsCredentialFormat` inherited by the new gate. Tracked separately. - Migrate the credential-format regex into `@opena2a/credential- patterns` for cross-tool sync. Tracked separately. Closes #164.
There was a problem hiding this comment.
Claude Code Review
VERDICT: APPROVE
SUMMARY: This PR addresses a real false-positive issue (#164) where .claude/settings.json files with defensive deny-rules and env-var placeholders were incorrectly triggering AST-CRED-001. The implementation adds three layers of defense: (1) a content-format gate requiring actual credential-format substrings (vendor prefixes or high-entropy tokens) before emitting AST-CRED-001, (2) credential masking at the analyzer layer to prevent raw tokens from leaking into findings/telemetry, and (3) governance-amplifier suppression on a curated allowlist of host-tool configs. The regex patterns are shared between AST-CRED-001 and AST-CRED-003 for consistency. All changes are well-tested with comprehensive regression coverage including positive controls, and the credential masking properly prevents information disclosure.
Verification checklist completed:
-
Regex DoS (buildCredentialFormatRegex): Examined line 569-585. All patterns are linear-time — anchored literals (
sk-,AKIA, etc.) followed by fixed-width or unbounded quantifiers without nesting. The high-entropy fallback\b[A-Za-z0-9+=_]{40,}\bis word-boundary anchored and uses a simple character class, not nested quantifiers. No catastrophic backtracking possible. ✓ -
Credential leakage (maskCredentialValue): Verified lines 620-650. Function masks tokens at the analyzer layer before they reach
finding.evidence(line 147). Known prefixes preserve only the recognizable format (e.g.,sk-ant-api03-*****), unknown shapes show first 8 chars + asterisks. The raw matched value fromre.exec(content)at line 598 is immediately masked before being returned. Test b15-positive (line 693) validates the masked output never contains raw token bytes. ✓ -
Array access safety (findFirstCredentialFormat): Lines 596-606. Uses
re.exec()which returns null on no match (checked at line 599). Line iteration from 0 tom.indexis safe becausem.indexis always a valid non-negative integer when match exists. No unchecked array indexing. ✓ -
Allowlist completeness (isStructuredConfigPath): Lines 650-682. Uses explicit path suffix matching (
.claude/settings.json,.vscode/settings.json) and exact basename Set membership. Does NOT match agent-definition surfaces like.cursor/mcp.json,agent.json, oragent-config.yamlper the design intent (line 647). The allowlist is deliberately conservative. ✓ -
Test coverage: benign-fp-regression.test.ts adds 4 test cases covering the fix (b15, b16, b15-positive) plus comprehensive assertions on line numbers, evidence masking, and positive controls. Existing test at source-code-preprocessor.test.ts:275 updated to pass artifact content. ✓
No unmitigated security or correctness issues found. The changes are defensive, well-scoped, and include appropriate test coverage.
Reviewed 4 files changed (19510 bytes)
Closes #164.
Summary
.claude/settings.jsonwas firing AST-CRED-001 MEDIUM ("Credentials in Non-Environment Context") on every project that uses Claude Code's hook protection. The upstream compiler tags any artifact whose lowercased content includes the word "credential" or "session" with adataType: 'credentials'access pattern, so a defensive.claude/settings.jsoncontaining"Read(.aws/credentials)"deny-rules and$VARplaceholder env values was being flagged despite holding no credential value.Implementation
checkCredentialsInNonEnvContext: only fire when the artifact content contains a credential-format substring per a curated regex (vendor prefixessk-,sk_live_,ghp_,gho_,github_pat_,xox[abprs]-,eyJJWT,AKIA…,AIza…, plus a high-entropy 40+ word-character fallback). The regex is shared with AST-CRED-003's evidence-text gate.file:line+evidencepopulation from the credential-format match position, with theevidencefield masked at the analyzer layer so the raw token never reaches finding output, JSON, telemetry, or Registry sync. Mask preserves recognizable prefixes (sk-ant-api03-→sk-ant-api03-****************)..claude/settings.json,.vscode/settings.json,package.json,tsconfig.json,.eslintrc.json, etc.) — NOT every.json/.yamlfile. MCP configs and A2A agent cards still get the amplifier text because they ARE agent-definition surfaces.Phase 4.5 adversarial review
A general-purpose subagent investigated 8 risk categories pre-merge. Three findings actioned in this PR:
evidence— fixed by masking infindFirstCredentialFormat..cursor/mcp.json/agent.json/agent-config.yaml— replaced with curated host-tool basename allowlist..clinerulesin adversarial corpus no longer fires AST-CRED-001 (used to fire on credential-keyword narrative alone). Accepted with rationale: the same fixture still firesAST-CRED-002 CRITICAL("credential forwarding to external endpoint") andSEM-INST-001/SEM-INST-002 HIGHon lines 1–3. System-level detection of the malicious artifact is preserved (release-smoke corpus 12/12 unchanged).Out of scope (filed as follow-ups):
pk_live_, TwilioAC…, Mailgunkey-, Heroku UUID, shortsk-< 20 chars, URL-safe base64 with-//). Pre-existing inevidenceShowsCredentialFormatshared regex.@opena2a/credential-patternsfor cross-tool sync.Verification
npm test2025/2051 pass (was 2022/2048; +3 new b15 / b16 / b15-positive tests).npm run release-smoke:corpus12/12..claude/settings.jsonsuppressed).secure ./hackmyagent/no longer fires MEDIUM on.claude/settings.json..cursorrules:7in the adversarial kitchen-sink corpus, with maskedsk-ant-api03-****************evidence.analyzeCredentialsGo-source regression test (source-code-preprocessor.test.ts:272) updated to passartifactContentso AST-CRED-001 still fires on a realsk-ant-api03-…value.Phase 4.5 score-jump classification
84 → 89 (+5):
.claude/settings.jsonsuppression site: (a) preserved-detection FP-suppress (real credentials still fire; only keyword-only mentions in host-tool configs are suppressed)..clinerulesunit-level site: (b) narrowed-detection at the AST-CRED-001 layer, but.clinerulesmalicious fixtures are still flagged CRITICAL via AST-CRED-002 + HIGH via SEM-INST-* — system-level detection preserved; documented above as "accepted with rationale."Test plan
npm testgreen (2025/2051)npm run release-smoke:corpus12/12secure ./hackmyagent/does not fire MEDIUM on.claude/settings.jsonsecure ~/.opena2a/corpus/repo/malicious/kitchen-sinkstill fires AST-CRED-001 on.cursorrules:7(masked evidence) AND AST-CRED-002 / SEM-INST-* on.clineruleslines 1–3agent-config.yamlfires withline: 4+ maskedxoxb-****…evidenceartifactContentRelated
Bundled with #162 / #163 fixes for the 0.22.1 release.