Skip to content

feat(0.2): AI risk subdivision + ctx audit on AI detectors#146

Open
pmclSF wants to merge 1 commit intomainfrom
feat/0.2-track5-ai-subdivision
Open

feat(0.2): AI risk subdivision + ctx audit on AI detectors#146
pmclSF wants to merge 1 commit intomainfrom
feat/0.2-track5-ai-subdivision

Conversation

@pmclSF
Copy link
Copy Markdown
Owner

@pmclSF pmclSF commented May 2, 2026

Summary

Two Track 5 deliverables for the parity-gated 0.2.0 plan. They raise
the trust posture of the AI Risk Review surface in how findings are
classified (Track 5.1) and how the detector path behaves under
cancellation (Track 5.3).

  • Track 5.1 — AI risk subdivision. Three trust tiers
    (Inventory / Hygiene / Regression) classified for every
    CategoryAI signal, with a drift gate that fails CI on a missing
    classification.
  • Track 5.3 — ctx audit on AI detectors.
    aidetect.DetectContext(ctx, root) honours ctx in the
    source-walk inner loop; pipeline calls it; cancellation tests
    prove the contract.

What changed

New code:

  • internal/signals/ai_subdomain.goAISubdomain enum,
    aiSubdomainBySignal map covering 26 AI signals,
    AISubdomainOf / AISubdomainLabel / AISubdomainTrustBadge
    helpers
  • internal/aidetect/detect.go — new DetectContext(ctx, root),
    ctx-aware inner walk via detectFromSourceCtx. Existing
    Detect(root) preserved as backwards-compat wrapper.

Wiring:

  • internal/engine/pipeline.go:413 — switched to DetectContext
    so cancellation propagates from RunPipelineContext end-to-end

New tests:

  • internal/signals/ai_subdomain_test.go — drift gate +
    per-tier sample classification + label/badge contract (5 tests)
  • internal/aidetect/cancellation_test.go — already-cancelled
    short-circuit + mid-walk cancel + backwards-compat invariant
    (3 tests with realistic fixture sizes)

New doc:

  • docs/product/ai-risk-tiers.md — the three-tier framing,
    per-tier signal lists, public-claim posture, gating contract
    (Tier 1 may be critical; Tier 2 caps at high), add-a-signal
    recipe pointing at the drift gate

Why these together

Both deliverables touch the AI risk surface and both raise the
honest trust posture of terrain ai run / terrain analyze in
mixed-AI repos. Bundling minimizes review thrash on tightly
related work.

Test plan

  • go build ./... clean
  • go test ./internal/... — 48 packages pass (incl. 8 new tests)
  • go test ./cmd/... clean
  • make docs-verify passes
  • Manual smoke: terrain analyze on a repo with mixed
    AI surface types → JSON output carries aiSubdomain per
    signal once renderer wiring lands (separate PR — this PR
    establishes the classification + helpers; renderer changes
    come in a follow-on)

Plan tracker

Closes Track 5.1 + 5.3 from the parity-gated 0.2.0 plan. Track 5
remaining: 5.6 (per-component timing in --verbose) which is
folded into Track 8.3.

🤖 Generated with Claude Code

Two Track 5 deliverables for the parity-gated 0.2.0 plan. Together
they raise the trust posture of the AI Risk Review surface — both
in how it's classified (Track 5.1) and how it behaves under
cancellation (Track 5.3).

Track 5.1 — AI risk subdivision (Inventory / Hygiene / Regression)
  Adds `internal/signals/ai_subdomain.go` with three trust tiers
  and a classification for every CategoryAI signal in the manifest:

    - Inventory  (Tier 1, publicly claimable): direct facts
      derived from declared AI surfaces — uncoveredAISurface,
      aiPromptVersioning, aiSafetyEvalMissing, capabilityValidationGap,
      phantomEvalScenario, aiPolicyViolation, untestedPromptFlow.

    - Hygiene    (Tier 2, visible but not gating-critical):
      heuristic structural patterns — aiPromptInjectionRisk,
      aiHardcodedAPIKey, aiToolWithoutSandbox, aiModelDeprecationRisk,
      aiFewShotContamination, contextOverflowRisk.

    - Regression (Tier 2, eval-data-dependent): fires only when
      eval-framework artifacts present — every cost / latency /
      hallucination / retrieval / tool-routing / RAG-grounding
      signal across the airun catalog.

  Public helpers `AISubdomainOf`, `AISubdomainLabel`, and
  `AISubdomainTrustBadge` give renderers a single source of truth
  for tier vocabulary so PR comment, terminal report, and JSON all
  speak the same language. Drift gate test
  `TestAISubdomain_AllAISignalsClassified` fails CI if a new AI
  signal is added without a tier — closes the "silent dump into
  legacy umbrella stanza" failure mode.

  Companion doc `docs/product/ai-risk-tiers.md` documents the three
  tiers, the public-claim posture per tier, the gating contract
  (Tier 1 may be critical; Tier 2 caps at high), and the
  add-a-signal recipe.

Track 5.3 — ctx audit on AI detector file walk
  Adds `aidetect.DetectContext(ctx, root)` that respects ctx in
  the source-walk inner loop — checks `ctx.Err()` every 64 entries,
  aborts cleanly when cancelled. The pre-Track-5.3 shape silently
  ignored ctx, so a `terrain analyze --timeout 5s` run against a
  large repo with AI patterns would still wait for the AI walk to
  finish after ctx had been cancelled.

  Pipeline call site (`internal/engine/pipeline.go:413`) now uses
  DetectContext so cancellation propagates end-to-end. Existing
  `aidetect.Detect(root)` is preserved as a backwards-compatible
  wrapper that delegates to DetectContext(context.Background()).

  New `cancellation_test.go` proves the contract:
    - already-cancelled ctx returns within 250ms on a 200-file
      fixture (vs ~50ms+ without short-circuit)
    - mid-walk cancel after 20ms aborts within 1s on a 1000-file
      fixture (vs ~200ms+ without honoring)
    - Detect / DetectContext produce identical results on the
      same fixture (backwards-compat invariant)

Verification: 48 internal packages pass; cmd tests pass;
make docs-verify clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 2, 2026

Terrain AI Risk Review

Metric Value
AI surfaces 13
Eval scenarios 17
Impacted scenarios 0
Uncovered surfaces 13

Decision: PASS — AI surfaces are covered.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 2, 2026

[RISK] Terrain — Merge with caution

High-severity gaps found in changed code.

Metric Value
Changed files 6 (3 source · 2 test)
Impacted units 14
Protection gaps 2
Tests selected 28 of 774 (3% of suite)

Coverage gaps in changed code

  • internal/engine/pipeline.go [MED] — Exported function ProgressFunc has no observed test coverage.
    → Add unit tests for exported function ProgressFunc — this is public API surface.
  • internal/engine/pipeline.go [MED] — Exported function RunPipelineContext has no observed test coverage.
    → Add unit tests for exported function RunPipelineContext — this is public API surface.
4 pre-existing issues on changed files
  • internal/aidetect/cancellation_test.go [MED] — [aiModelDeprecationRisk] model tag gpt-4 resolves to whatever the provider currently maps it to; pin a dated variant (e.g. gpt-4-0613)
  • internal/aidetect/detect.go [HIGH] — [blastRadiusHotspot] Changes to this file propagate to 367 tests (126 direct, 241 indirect). High blast radius increases regression risk.
  • internal/engine/pipeline.go [HIGH] — [blastRadiusHotspot] Changes to this file propagate to 1168 tests (144 direct, 1024 indirect). High blast radius increases regression risk.
  • internal/signals/ai_subdomain.go [HIGH] — [blastRadiusHotspot] Changes to this file propagate to 1854 tests (252 direct, 1602 indirect). High blast radius increases regression risk.

Recommended tests

28 test(s) with exact coverage of 12 impacted unit(s). 2 impacted unit(s) have no covering tests in the selected set.

Package Tests Sample
internal/aidetect 15 internal/aidetect/cancellation_test.go ...
internal/engine 6 internal/engine/artifacts_test.go ...
cmd/terrain 4 cmd/terrain/ai_workflow_test.go ...
internal/signals 2 internal/signals/ai_subdomain_test.go ...
internal/server 1 internal/server/server_test.go

AI Risk Review

Scenarios: 0 of 17 selected

1 advisory finding
  • internal/aidetect/cancellation_test.go:117 — Model tag is sunset or floats — the next API call could break or silently re-resolve.
    → Pin to a dated model variant (e.g. gpt-4-0613) or upgrade to a current tier.

Owners: PMCLSF

Limitations
  • No coverage artifacts provided; protection gaps reflect missing data, not measured absence. Provide --coverage to improve accuracy.
  • Mixed test cultures reduce cross-framework optimization confidence. Consider standardizing on fewer frameworks.

Generated by Terrain · terrain pr --json for machine-readable output

Targeted Test Results

Terrain selected 28 test(s) instead of the full suite.

  • Go tests: passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant