feat(0.2): adapter conformance + Promptfoo CI walkthrough (Track 7)#147
Open
feat(0.2): adapter conformance + Promptfoo CI walkthrough (Track 7)#147
Conversation
Three Track 7 deliverables that lift the Gate-pillar AI side from
"works on the happy path" to "works across the framework × version
matrix we claim to support, with honest drift signals."
Track 7.1 — Adapter conformance fixtures
New `internal/airun/conformance/` package with shape-fixture
tests covering each (framework × version) combination Terrain
supports today:
- Promptfoo v3 nested + v4 flat + missing-evalId variants
- DeepEval 1.x camelCase + 1.x snake_case + bare-array variants
- Ragas modern (samples + scores envelope) + legacy (bare array)
Adding a new shape fixture is the documented extension recipe;
the README at the package root spells it out.
Track 7.2 — Shape detection + warn-on-drift
New `internal/airun/shape.go` introduces `ShapeInfo` (framework,
detected version, version source, drift warnings) plus three
detector functions — DetectPromptfooShape, DetectDeepEvalShape,
DetectRagasShape. Each does a top-level envelope probe (no full
payload parse), classifies the shape, and emits warnings on
unfamiliar variants.
Public helpers `ShapeInfo.HasWarnings()` and `FormatWarnings()`
let downstream callers log a single per-run notice when an
adapter is parsing an unfamiliar shape, before the detector
chain consumes the result.
Track 7.4 — End-to-end Promptfoo+Terrain CI walkthrough
New `docs/examples/gate/ai-eval-ci/` directory with:
- README.md spelling out the full installation → baseline →
first-PR flow, plus what the example does NOT do
(anti-goals up front)
- github-action.yml — drop-in workflow that runs Promptfoo,
feeds results into Terrain, and posts the unified comment
- promptfoo.config.yaml + prompts/ + evals/ — minimal
working scenario package an adopter can copy and edit
The walkthrough's posture matches the parity plan's
recommended-CI-config rule: --new-findings-only --baseline by
default, --fail-on critical for blocking gates, hygiene findings
visible but non-gating.
Verification: 8 new conformance tests pass; full airun package
green; make docs-verify clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
[RISK] Terrain — Merge with caution
Coverage gaps in changed code
5 pre-existing issues on changed files
Recommended tests1 test(s) with exact coverage of 4 impacted unit(s). 2 impacted unit(s) have no covering tests in the selected set.
Owners: PMCLSF Limitations
Generated by Terrain · Targeted Test ResultsTerrain selected 1 test(s) instead of the full suite.
|
Terrain AI Risk Review
Decision: PASS — AI surfaces are covered. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three Track 7 (Gate pillar) deliverables that lift the AI side of
the gating story from "works on the happy path" to "works across
the framework × version matrix we claim to support, with honest
drift signals."
internal/airun/conformance/package with shape-fixture testsfor every (framework × version) Terrain supports.
ShapeInfotype + three detector functions (Promptfoo / DeepEval / Ragas)
that classify the envelope and emit warnings on unfamiliar
variants.
Drop-in
docs/examples/gate/ai-eval-ci/with README, workflow,Promptfoo config, prompt, and evals. An adopter copies the dir
and is gated within 30 minutes.
What changed
New code:
internal/airun/shape.go—ShapeInfostruct + three detectorfunctions (~210 lines, zero external deps)
New tests:
internal/airun/conformance/conformance_test.go— 11 testscovering 8 fixtures (3 Promptfoo + 3 DeepEval + 2 Ragas), plus
the empty-payload + format-warnings invariants
internal/airun/conformance/testdata/— 8 hand-curatedfixtures representing the canonical and drift shapes
New docs / example:
docs/examples/gate/ai-eval-ci/README.md— full walkthroughdocs/examples/gate/ai-eval-ci/github-action.yml— drop-inworkflow
docs/examples/gate/ai-eval-ci/promptfoo.config.yaml+ prompts +evals — working scenario pack
Test plan
go build ./...cleango test ./internal/airun/...— 11 new tests + existingadapter tests all green
make docs-verifypassesconfirm the unified comment renders end-to-end (separate
verification — needs a real Promptfoo install in CI)
Plan tracker
Closes Track 7.1, 7.2, 7.4 from the parity-gated 0.2.0 plan.
Track 7 remaining: 7.5 (AI gate decision precision corpus —
needs labeled real-world PRs, longer-cycle work).
🤖 Generated with Claude Code