Skip to content

feat(0.2): hallucination-rate framing + policy templates (Tracks 5.2/7.6/7.7)#141

Open
pmclSF wants to merge 1 commit intomainfrom
feat/0.2-mixed-pillar-lifts
Open

feat(0.2): hallucination-rate framing + policy templates (Tracks 5.2/7.6/7.7)#141
pmclSF wants to merge 1 commit intomainfrom
feat/0.2-mixed-pillar-lifts

Conversation

@pmclSF
Copy link
Copy Markdown
Owner

@pmclSF pmclSF commented May 2, 2026

Summary

Three small parity-axis lifts bundled into one PR. Each addresses a named cell from the audit doc.

Track 5.2 — `aiHallucinationRate` framing

The detector's name implies Terrain judges hallucinations directly. It does not — it reads hallucination-shaped failure metadata that the eval framework (Promptfoo / DeepEval / Ragas) has already computed and returns the rate. The launch-readiness review flagged this as overpromising.

For 0.2.0 we keep the type name `aiHallucinationRate` for back-compat (the rename touches every adapter, every test, every fixture) but tighten the manifest description + remediation so the trust framing is correct. The full type rename to `aiEvalFlaggedHallucinationShare` is 0.3 work.

Lifts: Area 5 (AI risk + inventory) E4 (Stability).

Track 7.6 — `terrain init` policy template

Tightened the commented policy template `terrain init` emits to:

  • reference the new `docs/policy/examples/` files so users have a clear "copy this file" path
  • add inline comments per rule
  • clean separation between core test-system rules and AI governance rules

Lifts: Area 10 (Policy / governance) P4 (Onboarding) from 2 → 3.

Track 7.7 — Three example policies

New files under `docs/policy/examples/`:

File Use when Blocks on
`minimal.yaml` First-time adoption / inherited debt Nothing — every rule warn-only
`balanced.yaml` Most teams, after a couple of weeks of polish Critical AI regressions + safety gaps + skipped tests
`strict.yaml` Mature repos on enforced-quality branches Every high-or-above finding + zero accuracy regression

Plus a `README.md` that documents the adoption ramp (minimal → balanced → strict), pairs each policy with the right CLI invocation, and cross-links to vision.md / CONTRIBUTING.md / feature-status.md.

Lifts: Area 10 (Policy / governance) P6 (Examples) from 2 → 3.

Test plan

  • `go test ./...` — full suite green
  • `make docs-verify` — manifest + rule docs regenerated and in sync
  • Manual read-through: policies make sense as a coherent adoption ramp

Plan link

`/Users/pzachary/.claude/plans/kind-mapping-turing.md` (Tracks 5.2 / 7.6 / 7.7).

🤖 Generated with Claude Code

…7.6/7.7)

Three small parity-axis lifts bundled into one PR. Each addresses a
named cell from the audit doc.

Track 5.2 — aiHallucinationRate framing
  The detector's name implies Terrain judges hallucinations directly.
  It does not — it reads hallucination-shaped failure metadata that
  the eval framework (Promptfoo / DeepEval / Ragas) has already
  computed and returns the rate. The launch-readiness review flagged
  this as overpromising.

  For 0.2.0: keep the type name `aiHallucinationRate` for back-compat
  (renaming the type touches every adapter, every test, every
  fixture) but tighten the manifest description + remediation so the
  trust framing is correct:

    Title:       "Eval-Flagged Hallucination Share"
    Description: explicitly notes that Terrain reads framework metadata,
                 does not judge hallucinations directly
    Remediation: tells the user to fix the eval scenario or raise the
                 threshold with documented justification when they
                 disagree with the framework's classification

  The actual type rename to `aiEvalFlaggedHallucinationShare` is 0.3
  work (deprecation alias + back-compat consumer migration).

  Lifts area 5 (AI risk + inventory) E4 (Stability) — the misleading
  name was the load-bearing concern there.

Track 7.6 — terrain init policy template
  Existing `generatePolicyYAML` already emits a commented template.
  Tightened to:
    - Reference the new docs/policy/examples/ files (Track 7.7) so
      users have a clear "copy this file" path instead of uncommenting
      the boilerplate one rule at a time.
    - Add inline comments per rule explaining what each does (was
      bare key/value pairs before).
    - Cleaner section split between "Core test-system rules" and
      "AI governance rules".

  Lifts area 10 (Policy / governance) P4 (Onboarding) from 2 to 3.

Track 7.7 — three example policies
  New files under docs/policy/examples/:

    minimal.yaml   — safe defaults for first-time adoption; every
                     rule warn-only, nothing blocks the build
    balanced.yaml  — recommended starting point for most teams;
                     blocks on critical AI regressions + safety gaps
                     + skipped tests; warns elsewhere
    strict.yaml    — mature-repo enforced-quality branch; blocks on
                     every high+ finding, zero accuracy-regression
                     tolerance

  Plus README.md that documents the adoption ramp (minimal → balanced
  → strict), pairs each policy with the right CLI invocation, and
  cross-links to vision.md / CONTRIBUTING.md / feature-status.md.

  Lifts area 10 (Policy / governance) P6 (Examples) from 2 to 3.

Pillar parity impact: this PR is a slice of the Understand-pillar
(via 5.2) + Gate-pillar (via 7.6 / 7.7) lift work toward the 0.2.0
release gate.

Verification:
  go test ./... — full suite green
  make docs-verify — manifest + rule docs regenerated and in sync
  Manual: read each new policy file end-to-end; the adoption ramp
  reads as one coherent story

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 2, 2026

Terrain AI Risk Review

Metric Value
AI surfaces 13
Eval scenarios 16
Impacted scenarios 0
Uncovered surfaces 13

Decision: PASS — AI surfaces are covered.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 2, 2026

[RISK] Terrain — Merge with caution

High-severity gaps found in changed code.

Metric Value
Changed files 8 (2 source · 0 test)
Impacted units 9
Protection gaps 2
Tests selected 3 of 772 (0% of suite)

Coverage gaps in changed code

  • internal/engine/initconfig.go [MED] — Exported function DetectedFramework has no observed test coverage.
    → Add unit tests for exported function DetectedFramework — this is public API surface.
  • internal/engine/initconfig.go [MED] — Exported function InitResult has no observed test coverage.
    → Add unit tests for exported function InitResult — this is public API surface.
6 pre-existing issues on changed files
  • docs/signals/manifest.json [MED] — [aiModelDeprecationRisk] model tag gpt-3.5-turbo is a moving alias; pin a dated variant
  • docs/signals/manifest.json [MED] — [aiModelDeprecationRisk] model tag gpt-4 resolves to whatever the provider currently maps it to; pin a dated variant (e.g. gpt-4-0613)
  • internal/signals/manifest.go [MED] — [aiModelDeprecationRisk] model tag gpt-3.5-turbo is a moving alias; pin a dated variant
  • internal/signals/manifest.go [MED] — [aiModelDeprecationRisk] model tag gpt-4 resolves to whatever the provider currently maps it to; pin a dated variant (e.g. gpt-4-0613)
  • internal/engine/initconfig.go [HIGH] — [blastRadiusHotspot] Changes to this file propagate to 1164 tests (144 direct, 1020 indirect). High blast radius increases regression risk.
  • ...and 1 more

Recommended tests

3 test(s) with exact coverage of 7 impacted unit(s). 2 impacted unit(s) have no covering tests in the selected set.

Test Confidence Why
internal/engine/initconfig_test.go exact exact coverage of RunInit
internal/signals/manifest_rule_docs_test.go exact exact coverage of AllSignalTypes, Manifest, SignalStatus + 1 more
internal/signals/manifest_test.go exact exact coverage of ManifestByType, ManifestEntry

AI Risk Review

Scenarios: 0 of 16 selected

2 advisory findings
  • docs/signals/manifest.json:1087 — Model tag is sunset or floats — the next API call could break or silently re-resolve.
    → Pin to a dated model variant (e.g. gpt-4-0613) or upgrade to a current tier.
  • internal/signals/manifest.go:841 — Model tag is sunset or floats — the next API call could break or silently re-resolve.
    → Pin to a dated model variant (e.g. gpt-4-0613) or upgrade to a current tier.

Owners: PMCLSF

Limitations
  • No coverage artifacts provided; protection gaps reflect missing data, not measured absence. Provide --coverage to improve accuracy.
  • Mixed test cultures reduce cross-framework optimization confidence. Consider standardizing on fewer frameworks.

Generated by Terrain · terrain pr --json for machine-readable output

Targeted Test Results

Terrain selected 3 test(s) instead of the full suite.

  • Go tests: passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant