Skip to content

feat(0.2): adapter conformance + Promptfoo CI walkthrough (Track 7)#147

Open
pmclSF wants to merge 1 commit intomainfrom
feat/0.2-track7-gate-lift
Open

feat(0.2): adapter conformance + Promptfoo CI walkthrough (Track 7)#147
pmclSF wants to merge 1 commit intomainfrom
feat/0.2-track7-gate-lift

Conversation

@pmclSF
Copy link
Copy Markdown
Owner

@pmclSF pmclSF commented May 2, 2026

Summary

Three Track 7 (Gate pillar) deliverables that lift the AI side of
the gating story from "works on the happy path" to "works across
the framework × version matrix we claim to support, with honest
drift signals."

  • Track 7.1 — Adapter conformance fixtures. A new
    internal/airun/conformance/ package with shape-fixture tests
    for every (framework × version) Terrain supports.
  • Track 7.2 — Shape detection + warn-on-drift. ShapeInfo
    type + three detector functions (Promptfoo / DeepEval / Ragas)
    that classify the envelope and emit warnings on unfamiliar
    variants.
  • Track 7.4 — End-to-end Promptfoo+Terrain CI walkthrough.
    Drop-in docs/examples/gate/ai-eval-ci/ with README, workflow,
    Promptfoo config, prompt, and evals. An adopter copies the dir
    and is gated within 30 minutes.

What changed

New code:

  • internal/airun/shape.goShapeInfo struct + three detector
    functions (~210 lines, zero external deps)

New tests:

  • internal/airun/conformance/conformance_test.go — 11 tests
    covering 8 fixtures (3 Promptfoo + 3 DeepEval + 2 Ragas), plus
    the empty-payload + format-warnings invariants
  • internal/airun/conformance/testdata/ — 8 hand-curated
    fixtures representing the canonical and drift shapes

New docs / example:

  • docs/examples/gate/ai-eval-ci/README.md — full walkthrough
  • docs/examples/gate/ai-eval-ci/github-action.yml — drop-in
    workflow
  • docs/examples/gate/ai-eval-ci/promptfoo.config.yaml + prompts +
    evals — working scenario pack

Test plan

  • go build ./... clean
  • go test ./internal/airun/... — 11 new tests + existing
    adapter tests all green
  • make docs-verify passes
  • Manual smoke: copy the walkthrough into a real repo and
    confirm the unified comment renders end-to-end (separate
    verification — needs a real Promptfoo install in CI)

Plan tracker

Closes Track 7.1, 7.2, 7.4 from the parity-gated 0.2.0 plan.
Track 7 remaining: 7.5 (AI gate decision precision corpus —
needs labeled real-world PRs, longer-cycle work).

🤖 Generated with Claude Code

Three Track 7 deliverables that lift the Gate-pillar AI side from
"works on the happy path" to "works across the framework × version
matrix we claim to support, with honest drift signals."

Track 7.1 — Adapter conformance fixtures
  New `internal/airun/conformance/` package with shape-fixture
  tests covering each (framework × version) combination Terrain
  supports today:
    - Promptfoo v3 nested + v4 flat + missing-evalId variants
    - DeepEval 1.x camelCase + 1.x snake_case + bare-array variants
    - Ragas modern (samples + scores envelope) + legacy (bare array)
  Adding a new shape fixture is the documented extension recipe;
  the README at the package root spells it out.

Track 7.2 — Shape detection + warn-on-drift
  New `internal/airun/shape.go` introduces `ShapeInfo` (framework,
  detected version, version source, drift warnings) plus three
  detector functions — DetectPromptfooShape, DetectDeepEvalShape,
  DetectRagasShape. Each does a top-level envelope probe (no full
  payload parse), classifies the shape, and emits warnings on
  unfamiliar variants.

  Public helpers `ShapeInfo.HasWarnings()` and `FormatWarnings()`
  let downstream callers log a single per-run notice when an
  adapter is parsing an unfamiliar shape, before the detector
  chain consumes the result.

Track 7.4 — End-to-end Promptfoo+Terrain CI walkthrough
  New `docs/examples/gate/ai-eval-ci/` directory with:
    - README.md spelling out the full installation → baseline →
      first-PR flow, plus what the example does NOT do
      (anti-goals up front)
    - github-action.yml — drop-in workflow that runs Promptfoo,
      feeds results into Terrain, and posts the unified comment
    - promptfoo.config.yaml + prompts/ + evals/ — minimal
      working scenario package an adopter can copy and edit

  The walkthrough's posture matches the parity plan's
  recommended-CI-config rule: --new-findings-only --baseline by
  default, --fail-on critical for blocking gates, hygiene findings
  visible but non-gating.

Verification: 8 new conformance tests pass; full airun package
green; make docs-verify clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 2, 2026

[RISK] Terrain — Merge with caution

High-severity gaps found in changed code.

Metric Value
Changed files 15 (1 source · 1 test)
Impacted units 6
Protection gaps 2
Tests selected 1 of 773 (0% of suite)

Coverage gaps in changed code

  • internal/airun/shape.go [MED] — Exported function FormatWarnings has no observed test coverage.
    → Add unit tests for exported function FormatWarnings — this is public API surface.
  • internal/airun/shape.go [MED] — Exported function HasWarnings has no observed test coverage.
    → Add unit tests for exported function HasWarnings — this is public API surface.
5 pre-existing issues on changed files
  • internal/airun/shape.go [HIGH] — [blastRadiusHotspot] Changes to this file propagate to 411 tests (67 direct, 344 indirect). High blast radius increases regression risk.
  • internal/airun/conformance/conformance_test.go [LOW] — [fixtureFragilityHotspot] Fixture 'load' is used by 10 tests across 1 files. A single change cascades widely.
  • internal/airun/conformance/conformance_test.go [LOW] — [fixtureFragilityHotspot] Fixture 'loadAndDetectDeepEval' is used by 10 tests across 1 files. A single change cascades widely.
  • internal/airun/conformance/conformance_test.go [LOW] — [fixtureFragilityHotspot] Fixture 'loadAndDetectPromptfoo' is used by 10 tests across 1 files. A single change cascades widely.
  • internal/airun/conformance/conformance_test.go [LOW] — [fixtureFragilityHotspot] Fixture 'loadAndDetectRagas' is used by 10 tests across 1 files. A single change cascades widely.

Recommended tests

1 test(s) with exact coverage of 4 impacted unit(s). 2 impacted unit(s) have no covering tests in the selected set.

Test Confidence Why
internal/airun/conformance/conformance_test.go exact exact coverage of DetectDeepEvalShape, DetectPromptfooShape, DetectRagasShape + 1 more

Owners: PMCLSF

Limitations
  • No coverage artifacts provided; protection gaps reflect missing data, not measured absence. Provide --coverage to improve accuracy.
  • Mixed test cultures reduce cross-framework optimization confidence. Consider standardizing on fewer frameworks.

Generated by Terrain · terrain pr --json for machine-readable output

Targeted Test Results

Terrain selected 1 test(s) instead of the full suite.

  • Go tests: passed

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 2, 2026

Terrain AI Risk Review

Metric Value
AI surfaces 13
Eval scenarios 16
Impacted scenarios 0
Uncovered surfaces 13

Decision: PASS — AI surfaces are covered.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant