Skip to content

feat(0.2): per-detector wall-clock timeout budgets (Track 9.4)#154

Open
pmclSF wants to merge 1 commit intomainfrom
feat/0.2-track9-detector-timeout-budgets
Open

feat(0.2): per-detector wall-clock timeout budgets (Track 9.4)#154
pmclSF wants to merge 1 commit intomainfrom
feat/0.2-track9-detector-timeout-budgets

Conversation

@pmclSF
Copy link
Copy Markdown
Owner

@pmclSF pmclSF commented May 3, 2026

Summary

Adds the per-detector budget mechanism that protects analyze runs
from a single hung detector blocking the whole pipeline. When a
detector exceeds its budget, the pipeline returns a marker signal
and moves on — the rest of the run completes normally.

What changed

  • New DetectorMeta.Budget time.Duration — zero means
    DefaultDetectorBudget (30s). Detectors with legitimate
    long-running work set this explicitly.
  • New safeDetectWithBudget(reg, fn) wrapper composes with
    existing safeDetect panic-recovery. All 9 call sites in
    detector_registry.go routed through it.
  • New SignalDetectorBudgetExceeded type registered in the
    manifest + signal catalog so ValidateSnapshot accepts the
    marker (same posture as detectorPanic).
  • Doc: docs/rules/engine/detector-budget.md regenerated.

Behavior contract

When budget elapses:

  • Pipeline returns the budget-exceeded marker
  • Detector goroutine completes in background (Go has no kill
    primitive); its post-budget signals are discarded
  • Wait paths in registry.Run / RunWithGraph complete promptly
    rather than blocking on the slow detector

This is the right trade-off for the failure modes targeted:
runaway regex, accidentally-O(n²) graph walks, blocking I/O.

Test plan

  • 5 new tests in detector_budget_test.go:
    budget-exceeded, fast-passes, zero-uses-default, panic+budget
    compose, registry-level integration
  • go test ./... — full suite green
  • make docs-verify — generated docs current

Plan tracker

Closes Track 9.4. Track 9 remaining: 9.1 (capability metadata),
9.2 (panic recovery completion), 9.3 (missing-input diagnostics),
9.5 (pipeline architectural separation), 9.6 (registry refactor),
9.7 (truth-verify). All explicitly post-0.2.0-blocking.

🤖 Generated with Claude Code

Adds the per-detector budget mechanism that protects analyze runs
from any single hung detector blocking the whole pipeline.

The mechanism

- New `DetectorMeta.Budget time.Duration` field. Zero means "use
  DefaultDetectorBudget" (30 seconds). Detectors with legitimate
  long-running work set this explicitly.
- New `safeDetectWithBudget(reg, fn)` wrapper composes with the
  existing safeDetect panic-recovery: a panicking detector still
  produces the detectorPanic marker; a slow detector produces the
  new detectorBudgetExceeded marker.
- All call sites in detector_registry.go (Run + RunWithGraph
  Phase 1/2/3 paths) routed through the budget wrapper. Pre-Track-
  9.4 a hung detector would block the goroutine waiting on
  wg.Wait(); now the budget elapses first and the wait completes.
- New SignalDetectorBudgetExceeded type registered in the manifest
  + signal catalog so ValidateSnapshot accepts the marker (same
  posture as detectorPanic — without the catalog entry, a single
  budget overrun would invalidate the whole snapshot).

Behavior contract

When a detector exceeds its budget, the pipeline returns the
budget-exceeded marker instead of waiting for the eventual
completion. The detector goroutine completes in the background
(Go has no goroutine kill primitive); its post-budget signals are
discarded. This is the right trade-off for the failure modes the
budget targets: runaway regex, accidentally-O(n²) graph walks,
blocking I/O on a slow filesystem.

Coverage

Five new tests in detector_budget_test.go:
  - budget exceeded → marker returned within budget window
  - fast detector → original signals returned
  - zero budget → DefaultDetectorBudget applied
  - panic + budget compose correctly (detectorPanic wins)
  - registry-level integration: r.Run() returns within budget
    even when a registered detector deliberately sleeps past it

Plus the regenerated docs/rules/engine/detector-budget.md doc.

Verification: all 5 budget tests pass; full Go test suite green;
make docs-verify clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 3, 2026

Terrain AI Risk Review

Metric Value
AI surfaces 13
Eval scenarios 16
Impacted scenarios 0
Uncovered surfaces 13

Decision: PASS — AI surfaces are covered.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 3, 2026

[RISK] Terrain — Merge with caution

High-severity gaps found in changed code.

Metric Value
Changed files 7 (4 source · 1 test)
Impacted units 33
Protection gaps 11
Tests selected 8 of 773 (1% of suite)

Coverage gaps in changed code

  • internal/models/signal_catalog.go [MED] — Exported function KnownSignalTypes has no observed test coverage.
    → Add unit tests for exported function KnownSignalTypes — this is public API surface.
  • internal/models/signal_catalog.go [MED] — Exported function SignalCatalogEntry has no observed test coverage.
    → Add unit tests for exported function SignalCatalogEntry — this is public API surface.
  • internal/models/signal_catalog.go [MED] — Exported function SignalSource has no observed test coverage.
    → Add unit tests for exported function SignalSource — this is public API surface.
  • internal/signals/detector_registry.go [MED] — Exported function All has no observed test coverage.
    → Add unit tests for exported function All — this is public API surface.
  • internal/signals/detector_registry.go [MED] — Exported function ByDomain has no observed test coverage.
    → Add unit tests for exported function ByDomain — this is public API surface.
  • internal/signals/detector_registry.go [MED] — Exported function Detectors has no observed test coverage.
    → Add unit tests for exported function Detectors — this is public API surface.
  • internal/signals/detector_registry.go [MED] — Exported function Len has no observed test coverage.
    → Add unit tests for exported function Len — this is public API surface.
  • internal/signals/detector_registry.go [MED] — Exported function Register has no observed test coverage.
    → Add unit tests for exported function Register — this is public API surface.
  • internal/signals/detector_registry.go [MED] — Exported function Run has no observed test coverage.
    → Add unit tests for exported function Run — this is public API surface.
  • internal/signals/detector_registry.go [MED] — Exported function RunDomain has no observed test coverage.
    → Add unit tests for exported function RunDomain — this is public API surface.
  • ...and 1 more (1 medium)
8 pre-existing issues on changed files
  • docs/signals/manifest.json [MED] — [aiModelDeprecationRisk] model tag gpt-3.5-turbo is a moving alias; pin a dated variant
  • docs/signals/manifest.json [MED] — [aiModelDeprecationRisk] model tag gpt-4 resolves to whatever the provider currently maps it to; pin a dated variant (e.g. gpt-4-0613)
  • internal/signals/manifest.go [MED] — [aiModelDeprecationRisk] model tag gpt-3.5-turbo is a moving alias; pin a dated variant
  • internal/signals/manifest.go [MED] — [aiModelDeprecationRisk] model tag gpt-4 resolves to whatever the provider currently maps it to; pin a dated variant (e.g. gpt-4-0613)
  • internal/models/signal_catalog.go [HIGH] — [blastRadiusHotspot] Changes to this file propagate to 2140 tests (1508 direct, 632 indirect). High blast radius increases regression risk.
  • ...and 3 more

Recommended tests

8 test(s) with exact coverage of 22 impacted unit(s). 11 impacted unit(s) have no covering tests in the selected set.

Test Confidence Why
internal/engine/registry_error_test.go exact exact coverage of DetectorMeta, DetectorRegistration, Domain + 1 more
internal/engine/registry_test.go exact exact coverage of Domain, EvidenceType
internal/measurement/measurement_test.go exact exact coverage of NewRegistry
internal/signals/detector_budget_test.go exact exact coverage of DetectorMeta, DetectorRegistration, Domain + 1 more
internal/signals/detector_registry_test.go exact exact coverage of IsKnownSignalType
internal/signals/manifest_rule_docs_test.go exact exact coverage of DefaultDetectorBudget, AllSignalTypes, Manifest + 9 more
internal/signals/manifest_test.go exact exact coverage of SignalCatalog
internal/structural/structural_test.go exact exact coverage of EvidenceType

AI Risk Review

Scenarios: 0 of 16 selected

2 advisory findings
  • docs/signals/manifest.json:1087 — Model tag is sunset or floats — the next API call could break or silently re-resolve.
    → Pin to a dated model variant (e.g. gpt-4-0613) or upgrade to a current tier.
  • internal/signals/manifest.go:841 — Model tag is sunset or floats — the next API call could break or silently re-resolve.
    → Pin to a dated model variant (e.g. gpt-4-0613) or upgrade to a current tier.

Owners: PMCLSF

Limitations
  • No coverage artifacts provided; protection gaps reflect missing data, not measured absence. Provide --coverage to improve accuracy.
  • Mixed test cultures reduce cross-framework optimization confidence. Consider standardizing on fewer frameworks.

Generated by Terrain · terrain pr --json for machine-readable output

Targeted Test Results

Terrain selected 8 test(s) instead of the full suite.

  • Go tests: passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant