feat(0.2): hallucination-rate framing + policy templates (Tracks 5.2/7.6/7.7)#141
Open
feat(0.2): hallucination-rate framing + policy templates (Tracks 5.2/7.6/7.7)#141
Conversation
…7.6/7.7)
Three small parity-axis lifts bundled into one PR. Each addresses a
named cell from the audit doc.
Track 5.2 — aiHallucinationRate framing
The detector's name implies Terrain judges hallucinations directly.
It does not — it reads hallucination-shaped failure metadata that
the eval framework (Promptfoo / DeepEval / Ragas) has already
computed and returns the rate. The launch-readiness review flagged
this as overpromising.
For 0.2.0: keep the type name `aiHallucinationRate` for back-compat
(renaming the type touches every adapter, every test, every
fixture) but tighten the manifest description + remediation so the
trust framing is correct:
Title: "Eval-Flagged Hallucination Share"
Description: explicitly notes that Terrain reads framework metadata,
does not judge hallucinations directly
Remediation: tells the user to fix the eval scenario or raise the
threshold with documented justification when they
disagree with the framework's classification
The actual type rename to `aiEvalFlaggedHallucinationShare` is 0.3
work (deprecation alias + back-compat consumer migration).
Lifts area 5 (AI risk + inventory) E4 (Stability) — the misleading
name was the load-bearing concern there.
Track 7.6 — terrain init policy template
Existing `generatePolicyYAML` already emits a commented template.
Tightened to:
- Reference the new docs/policy/examples/ files (Track 7.7) so
users have a clear "copy this file" path instead of uncommenting
the boilerplate one rule at a time.
- Add inline comments per rule explaining what each does (was
bare key/value pairs before).
- Cleaner section split between "Core test-system rules" and
"AI governance rules".
Lifts area 10 (Policy / governance) P4 (Onboarding) from 2 to 3.
Track 7.7 — three example policies
New files under docs/policy/examples/:
minimal.yaml — safe defaults for first-time adoption; every
rule warn-only, nothing blocks the build
balanced.yaml — recommended starting point for most teams;
blocks on critical AI regressions + safety gaps
+ skipped tests; warns elsewhere
strict.yaml — mature-repo enforced-quality branch; blocks on
every high+ finding, zero accuracy-regression
tolerance
Plus README.md that documents the adoption ramp (minimal → balanced
→ strict), pairs each policy with the right CLI invocation, and
cross-links to vision.md / CONTRIBUTING.md / feature-status.md.
Lifts area 10 (Policy / governance) P6 (Examples) from 2 to 3.
Pillar parity impact: this PR is a slice of the Understand-pillar
(via 5.2) + Gate-pillar (via 7.6 / 7.7) lift work toward the 0.2.0
release gate.
Verification:
go test ./... — full suite green
make docs-verify — manifest + rule docs regenerated and in sync
Manual: read each new policy file end-to-end; the adoption ramp
reads as one coherent story
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Terrain AI Risk Review
Decision: PASS — AI surfaces are covered. |
[RISK] Terrain — Merge with caution
Coverage gaps in changed code
6 pre-existing issues on changed files
Recommended tests3 test(s) with exact coverage of 7 impacted unit(s). 2 impacted unit(s) have no covering tests in the selected set.
AI Risk Review
2 advisory findings
Owners: PMCLSF Limitations
Generated by Terrain · Targeted Test ResultsTerrain selected 3 test(s) instead of the full suite.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three small parity-axis lifts bundled into one PR. Each addresses a named cell from the audit doc.
Track 5.2 — `aiHallucinationRate` framing
The detector's name implies Terrain judges hallucinations directly. It does not — it reads hallucination-shaped failure metadata that the eval framework (Promptfoo / DeepEval / Ragas) has already computed and returns the rate. The launch-readiness review flagged this as overpromising.
For 0.2.0 we keep the type name `aiHallucinationRate` for back-compat (the rename touches every adapter, every test, every fixture) but tighten the manifest description + remediation so the trust framing is correct. The full type rename to `aiEvalFlaggedHallucinationShare` is 0.3 work.
Lifts: Area 5 (AI risk + inventory) E4 (Stability).
Track 7.6 — `terrain init` policy template
Tightened the commented policy template `terrain init` emits to:
Lifts: Area 10 (Policy / governance) P4 (Onboarding) from 2 → 3.
Track 7.7 — Three example policies
New files under `docs/policy/examples/`:
Plus a `README.md` that documents the adoption ramp (minimal → balanced → strict), pairs each policy with the right CLI invocation, and cross-links to vision.md / CONTRIBUTING.md / feature-status.md.
Lifts: Area 10 (Policy / governance) P6 (Examples) from 2 → 3.
Test plan
Plan link
`/Users/pzachary/.claude/plans/kind-mapping-turing.md` (Tracks 5.2 / 7.6 / 7.7).
🤖 Generated with Claude Code