feat(0.2): adapter conformance + Promptfoo CI walkthrough (Track 7) by pmclSF · Pull Request #147 · pmclSF/terrain

pmclSF · 2026-05-02T13:52:40Z

Summary

Three Track 7 (Gate pillar) deliverables that lift the AI side of
the gating story from "works on the happy path" to "works across
the framework × version matrix we claim to support, with honest
drift signals."

Track 7.1 — Adapter conformance fixtures. A new
internal/airun/conformance/ package with shape-fixture tests
for every (framework × version) Terrain supports.
Track 7.2 — Shape detection + warn-on-drift. ShapeInfo
type + three detector functions (Promptfoo / DeepEval / Ragas)
that classify the envelope and emit warnings on unfamiliar
variants.
Track 7.4 — End-to-end Promptfoo+Terrain CI walkthrough.
Drop-in docs/examples/gate/ai-eval-ci/ with README, workflow,
Promptfoo config, prompt, and evals. An adopter copies the dir
and is gated within 30 minutes.

What changed

New code:

internal/airun/shape.go — ShapeInfo struct + three detector
functions (~210 lines, zero external deps)

New tests:

internal/airun/conformance/conformance_test.go — 11 tests
covering 8 fixtures (3 Promptfoo + 3 DeepEval + 2 Ragas), plus
the empty-payload + format-warnings invariants
internal/airun/conformance/testdata/ — 8 hand-curated
fixtures representing the canonical and drift shapes

New docs / example:

docs/examples/gate/ai-eval-ci/README.md — full walkthrough
docs/examples/gate/ai-eval-ci/github-action.yml — drop-in
workflow
docs/examples/gate/ai-eval-ci/promptfoo.config.yaml + prompts +
evals — working scenario pack

Test plan

go build ./... clean
go test ./internal/airun/... — 11 new tests + existing
adapter tests all green
make docs-verify passes
Manual smoke: copy the walkthrough into a real repo and
confirm the unified comment renders end-to-end (separate
verification — needs a real Promptfoo install in CI)

Plan tracker

Closes Track 7.1, 7.2, 7.4 from the parity-gated 0.2.0 plan.
Track 7 remaining: 7.5 (AI gate decision precision corpus —
needs labeled real-world PRs, longer-cycle work).

🤖 Generated with Claude Code

Three Track 7 deliverables that lift the Gate-pillar AI side from "works on the happy path" to "works across the framework × version matrix we claim to support, with honest drift signals." Track 7.1 — Adapter conformance fixtures New `internal/airun/conformance/` package with shape-fixture tests covering each (framework × version) combination Terrain supports today: - Promptfoo v3 nested + v4 flat + missing-evalId variants - DeepEval 1.x camelCase + 1.x snake_case + bare-array variants - Ragas modern (samples + scores envelope) + legacy (bare array) Adding a new shape fixture is the documented extension recipe; the README at the package root spells it out. Track 7.2 — Shape detection + warn-on-drift New `internal/airun/shape.go` introduces `ShapeInfo` (framework, detected version, version source, drift warnings) plus three detector functions — DetectPromptfooShape, DetectDeepEvalShape, DetectRagasShape. Each does a top-level envelope probe (no full payload parse), classifies the shape, and emits warnings on unfamiliar variants. Public helpers `ShapeInfo.HasWarnings()` and `FormatWarnings()` let downstream callers log a single per-run notice when an adapter is parsing an unfamiliar shape, before the detector chain consumes the result. Track 7.4 — End-to-end Promptfoo+Terrain CI walkthrough New `docs/examples/gate/ai-eval-ci/` directory with: - README.md spelling out the full installation → baseline → first-PR flow, plus what the example does NOT do (anti-goals up front) - github-action.yml — drop-in workflow that runs Promptfoo, feeds results into Terrain, and posts the unified comment - promptfoo.config.yaml + prompts/ + evals/ — minimal working scenario package an adopter can copy and edit The walkthrough's posture matches the parity plan's recommended-CI-config rule: --new-findings-only --baseline by default, --fail-on critical for blocking gates, hygiene findings visible but non-gating. Verification: 8 new conformance tests pass; full airun package green; make docs-verify clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-02T13:53:20Z

[RISK] Terrain — Merge with caution

High-severity gaps found in changed code.

Metric	Value
Changed files	15 (1 source · 1 test)
Impacted units	6
Protection gaps	2
Tests selected	1 of 773 (0% of suite)

Coverage gaps in changed code

internal/airun/shape.go [MED] — Exported function FormatWarnings has no observed test coverage.
→ Add unit tests for exported function FormatWarnings — this is public API surface.
internal/airun/shape.go [MED] — Exported function HasWarnings has no observed test coverage.
→ Add unit tests for exported function HasWarnings — this is public API surface.

5 pre-existing issues on changed files

internal/airun/shape.go [HIGH] — [blastRadiusHotspot] Changes to this file propagate to 411 tests (67 direct, 344 indirect). High blast radius increases regression risk.
internal/airun/conformance/conformance_test.go [LOW] — [fixtureFragilityHotspot] Fixture 'load' is used by 10 tests across 1 files. A single change cascades widely.
internal/airun/conformance/conformance_test.go [LOW] — [fixtureFragilityHotspot] Fixture 'loadAndDetectDeepEval' is used by 10 tests across 1 files. A single change cascades widely.
internal/airun/conformance/conformance_test.go [LOW] — [fixtureFragilityHotspot] Fixture 'loadAndDetectPromptfoo' is used by 10 tests across 1 files. A single change cascades widely.
internal/airun/conformance/conformance_test.go [LOW] — [fixtureFragilityHotspot] Fixture 'loadAndDetectRagas' is used by 10 tests across 1 files. A single change cascades widely.

Recommended tests

1 test(s) with exact coverage of 4 impacted unit(s). 2 impacted unit(s) have no covering tests in the selected set.

Test	Confidence	Why
`internal/airun/conformance/conformance_test.go`	exact	exact coverage of `DetectDeepEvalShape`, `DetectPromptfooShape`, `DetectRagasShape` + 1 more

Owners: PMCLSF

Limitations

No coverage artifacts provided; protection gaps reflect missing data, not measured absence. Provide --coverage to improve accuracy.
Mixed test cultures reduce cross-framework optimization confidence. Consider standardizing on fewer frameworks.

_{Generated by Terrain · terrain pr --json for machine-readable output}

Targeted Test Results

Terrain selected 1 test(s) instead of the full suite.

Go tests: passed

github-actions · 2026-05-02T13:53:31Z

Terrain AI Risk Review

Metric	Value
AI surfaces	13
Eval scenarios	16
Impacted scenarios	0
Uncovered surfaces	13

Decision: PASS — AI surfaces are covered.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(0.2): adapter conformance + Promptfoo CI walkthrough (Track 7)#147

feat(0.2): adapter conformance + Promptfoo CI walkthrough (Track 7)#147
pmclSF wants to merge 1 commit intomainfrom
feat/0.2-track7-gate-lift

pmclSF commented May 2, 2026

Uh oh!

github-actions Bot commented May 2, 2026

Uh oh!

github-actions Bot commented May 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pmclSF commented May 2, 2026

Summary

What changed

Test plan

Plan tracker

Uh oh!

github-actions Bot commented May 2, 2026

[RISK] Terrain — Merge with caution

Coverage gaps in changed code

Recommended tests

Targeted Test Results

Uh oh!

github-actions Bot commented May 2, 2026

Terrain AI Risk Review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant