feat(0.2): hallucination-rate framing + policy templates (Tracks 5.2/7.6/7.7) by pmclSF · Pull Request #141 · pmclSF/terrain

pmclSF · 2026-05-02T13:01:57Z

Summary

Three small parity-axis lifts bundled into one PR. Each addresses a named cell from the audit doc.

Track 5.2 — `aiHallucinationRate` framing

The detector's name implies Terrain judges hallucinations directly. It does not — it reads hallucination-shaped failure metadata that the eval framework (Promptfoo / DeepEval / Ragas) has already computed and returns the rate. The launch-readiness review flagged this as overpromising.

For 0.2.0 we keep the type name `aiHallucinationRate` for back-compat (the rename touches every adapter, every test, every fixture) but tighten the manifest description + remediation so the trust framing is correct. The full type rename to `aiEvalFlaggedHallucinationShare` is 0.3 work.

Lifts: Area 5 (AI risk + inventory) E4 (Stability).

Track 7.6 — `terrain init` policy template

Tightened the commented policy template `terrain init` emits to:

reference the new `docs/policy/examples/` files so users have a clear "copy this file" path
add inline comments per rule
clean separation between core test-system rules and AI governance rules

Lifts: Area 10 (Policy / governance) P4 (Onboarding) from 2 → 3.

Track 7.7 — Three example policies

New files under `docs/policy/examples/`:

File	Use when	Blocks on
`minimal.yaml`	First-time adoption / inherited debt	Nothing — every rule warn-only
`balanced.yaml`	Most teams, after a couple of weeks of polish	Critical AI regressions + safety gaps + skipped tests
`strict.yaml`	Mature repos on enforced-quality branches	Every high-or-above finding + zero accuracy regression

Plus a `README.md` that documents the adoption ramp (minimal → balanced → strict), pairs each policy with the right CLI invocation, and cross-links to vision.md / CONTRIBUTING.md / feature-status.md.

Lifts: Area 10 (Policy / governance) P6 (Examples) from 2 → 3.

Test plan

`go test ./...` — full suite green
`make docs-verify` — manifest + rule docs regenerated and in sync
Manual read-through: policies make sense as a coherent adoption ramp

Plan link

`/Users/pzachary/.claude/plans/kind-mapping-turing.md` (Tracks 5.2 / 7.6 / 7.7).

🤖 Generated with Claude Code

…7.6/7.7) Three small parity-axis lifts bundled into one PR. Each addresses a named cell from the audit doc. Track 5.2 — aiHallucinationRate framing The detector's name implies Terrain judges hallucinations directly. It does not — it reads hallucination-shaped failure metadata that the eval framework (Promptfoo / DeepEval / Ragas) has already computed and returns the rate. The launch-readiness review flagged this as overpromising. For 0.2.0: keep the type name `aiHallucinationRate` for back-compat (renaming the type touches every adapter, every test, every fixture) but tighten the manifest description + remediation so the trust framing is correct: Title: "Eval-Flagged Hallucination Share" Description: explicitly notes that Terrain reads framework metadata, does not judge hallucinations directly Remediation: tells the user to fix the eval scenario or raise the threshold with documented justification when they disagree with the framework's classification The actual type rename to `aiEvalFlaggedHallucinationShare` is 0.3 work (deprecation alias + back-compat consumer migration). Lifts area 5 (AI risk + inventory) E4 (Stability) — the misleading name was the load-bearing concern there. Track 7.6 — terrain init policy template Existing `generatePolicyYAML` already emits a commented template. Tightened to: - Reference the new docs/policy/examples/ files (Track 7.7) so users have a clear "copy this file" path instead of uncommenting the boilerplate one rule at a time. - Add inline comments per rule explaining what each does (was bare key/value pairs before). - Cleaner section split between "Core test-system rules" and "AI governance rules". Lifts area 10 (Policy / governance) P4 (Onboarding) from 2 to 3. Track 7.7 — three example policies New files under docs/policy/examples/: minimal.yaml — safe defaults for first-time adoption; every rule warn-only, nothing blocks the build balanced.yaml — recommended starting point for most teams; blocks on critical AI regressions + safety gaps + skipped tests; warns elsewhere strict.yaml — mature-repo enforced-quality branch; blocks on every high+ finding, zero accuracy-regression tolerance Plus README.md that documents the adoption ramp (minimal → balanced → strict), pairs each policy with the right CLI invocation, and cross-links to vision.md / CONTRIBUTING.md / feature-status.md. Lifts area 10 (Policy / governance) P6 (Examples) from 2 to 3. Pillar parity impact: this PR is a slice of the Understand-pillar (via 5.2) + Gate-pillar (via 7.6 / 7.7) lift work toward the 0.2.0 release gate. Verification: go test ./... — full suite green make docs-verify — manifest + rule docs regenerated and in sync Manual: read each new policy file end-to-end; the adoption ramp reads as one coherent story Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-02T13:02:46Z

Terrain AI Risk Review

Metric	Value
AI surfaces	13
Eval scenarios	16
Impacted scenarios	0
Uncovered surfaces	13

Decision: PASS — AI surfaces are covered.

github-actions · 2026-05-02T13:03:06Z

[RISK] Terrain — Merge with caution

High-severity gaps found in changed code.

Metric	Value
Changed files	8 (2 source · 0 test)
Impacted units	9
Protection gaps	2
Tests selected	3 of 772 (0% of suite)

Coverage gaps in changed code

internal/engine/initconfig.go [MED] — Exported function DetectedFramework has no observed test coverage.
→ Add unit tests for exported function DetectedFramework — this is public API surface.
internal/engine/initconfig.go [MED] — Exported function InitResult has no observed test coverage.
→ Add unit tests for exported function InitResult — this is public API surface.

6 pre-existing issues on changed files

docs/signals/manifest.json [MED] — [aiModelDeprecationRisk] model tag gpt-3.5-turbo is a moving alias; pin a dated variant
docs/signals/manifest.json [MED] — [aiModelDeprecationRisk] model tag gpt-4 resolves to whatever the provider currently maps it to; pin a dated variant (e.g. gpt-4-0613)
internal/signals/manifest.go [MED] — [aiModelDeprecationRisk] model tag gpt-3.5-turbo is a moving alias; pin a dated variant
internal/signals/manifest.go [MED] — [aiModelDeprecationRisk] model tag gpt-4 resolves to whatever the provider currently maps it to; pin a dated variant (e.g. gpt-4-0613)
internal/engine/initconfig.go [HIGH] — [blastRadiusHotspot] Changes to this file propagate to 1164 tests (144 direct, 1020 indirect). High blast radius increases regression risk.
...and 1 more

Recommended tests

3 test(s) with exact coverage of 7 impacted unit(s). 2 impacted unit(s) have no covering tests in the selected set.

Test	Confidence	Why
`internal/engine/initconfig_test.go`	exact	exact coverage of `RunInit`
`internal/signals/manifest_rule_docs_test.go`	exact	exact coverage of `AllSignalTypes`, `Manifest`, `SignalStatus` + 1 more
`internal/signals/manifest_test.go`	exact	exact coverage of `ManifestByType`, `ManifestEntry`

AI Risk Review

Scenarios: 0 of 16 selected

2 advisory findings

docs/signals/manifest.json:1087 — Model tag is sunset or floats — the next API call could break or silently re-resolve.
→ Pin to a dated model variant (e.g. gpt-4-0613) or upgrade to a current tier.
internal/signals/manifest.go:841 — Model tag is sunset or floats — the next API call could break or silently re-resolve.
→ Pin to a dated model variant (e.g. gpt-4-0613) or upgrade to a current tier.

Owners: PMCLSF

Limitations

No coverage artifacts provided; protection gaps reflect missing data, not measured absence. Provide --coverage to improve accuracy.
Mixed test cultures reduce cross-framework optimization confidence. Consider standardizing on fewer frameworks.

_{Generated by Terrain · terrain pr --json for machine-readable output}

Targeted Test Results

Terrain selected 3 test(s) instead of the full suite.

Go tests: passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(0.2): hallucination-rate framing + policy templates (Tracks 5.2/7.6/7.7)#141

feat(0.2): hallucination-rate framing + policy templates (Tracks 5.2/7.6/7.7)#141
pmclSF wants to merge 1 commit intomainfrom
feat/0.2-mixed-pillar-lifts

pmclSF commented May 2, 2026

Uh oh!

github-actions Bot commented May 2, 2026

Uh oh!

github-actions Bot commented May 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pmclSF commented May 2, 2026

Summary

Track 5.2 — `aiHallucinationRate` framing

Track 7.6 — `terrain init` policy template

Track 7.7 — Three example policies

Test plan

Plan link

Uh oh!

github-actions Bot commented May 2, 2026

Terrain AI Risk Review

Uh oh!

github-actions Bot commented May 2, 2026

[RISK] Terrain — Merge with caution

Coverage gaps in changed code

Recommended tests

AI Risk Review

Targeted Test Results

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant