feat(testing): Add Golden Response System for Snapshot Testing by Mustafa11300 · Pull Request #1607 · mofa-org/mofa

Mustafa11300 · 2026-04-09T18:22:04Z

🔍 Description

Adds golden response (snapshot) testing to the mofa-testing crate. Agent outputs are recorded as baselines, then future runs are compared against them to detect regressions automatically.

This is Issue 4 from the testing platform roadmap (Test Definition Layer, Phase 2).

Closes #1604
Depends on #1599

📌 Changes

New: `tests/src/golden.rs`

Core golden response module with the following public components:

Component	Purpose
`GoldenSnapshot`	Serializable record of turn outputs (JSON/YAML)
`GoldenTurnSnapshot`	Per-turn golden record: user_input, response, tool_calls
`GoldenStore`	Filesystem-backed read/write/list for `.golden.json` files
`GoldenTestConfig`	Configuration: strict (validate) vs. update (record) modes
`GoldenCompareMode`	Enum: `Strict` or `Update`
`GoldenCompareResult`	Structured comparison outcome with pass/fail and diffs
`GoldenDiff`	Per-field diff variants (response, tool count, tool name, tool args, turn count)
`GoldenError`	Structured error type for all golden operations
`Normalizer` trait	Pluggable text normalization for non-deterministic content
`WhitespaceNormalizer`	Collapses whitespace before comparison
`RegexNormalizer`	Replaces regex matches with placeholders (UUIDs, timestamps)
`NormalizerChain`	Chains normalizers; `default_chain()` handles UUIDs + timestamps + whitespace
`compare_golden()`	Standalone comparison engine with optional normalizer
`run_golden_test()`	End-to-end golden test runner integrated with `TestReport`

Comparison fields:

Response text (with optional normalization)
Tool call count per turn
Tool call names (positional)
Tool call arguments (deep JSON equality)
Turn count

Modes:

Update: Executes scenario, saves outputs as new golden baseline
Strict: Executes scenario, compares against stored golden, reports diffs

Modified: `tests/src/lib.rs`

Added pub mod golden module registration
Added public re-exports for all golden types and functions

Modified: `tests/Cargo.toml`

Added tempfile = { workspace = true } as dev-dependency for test isolation

New: `tests/tests/golden_tests.rs`

30+ comprehensive tests covering:

Serialization: JSON/YAML roundtrip, multi-turn snapshots, metadata
GoldenStore: save/load, exists check, list snapshots, special characters in names, nonexistent file error
Comparison: identical outputs (no diffs), response mismatch, turn count mismatch, tool call count mismatch, tool name mismatch, tool args mismatch, multiple diffs in single comparison
Normalizers: whitespace collapse, UUID replacement, timestamp replacement, normalizer chain, compare-with-normalizer for whitespace and UUIDs
Integration: update mode saves snapshot, strict mode passes/fails, missing snapshot error, full update→strict roundtrip, strict with normalizer, multi-turn golden, tool call golden
Display: diff formatting, diff serialization

New: `examples/golden_response_test/`

File	Description
`README.md`	Usage guide with update/strict/normalizer code samples and CI workflow
`goldens/weather_agent.golden.json`	Example golden: multi-turn weather agent with tool calls
`goldens/support_agent.golden.json`	Example golden: support agent with ticket lookup

🧪 Testing

All new functionality is covered by tests/tests/golden_tests.rs with 30+ test cases.

Key test categories:

Unit tests — Snapshot serialization, normalizers, diff formatting
Store tests — Filesystem save/load/list/exists operations
Comparison tests — Field-level diff detection for all mismatch types
Integration tests — Full update→strict roundtrip, multi-turn, tool calls
Normalizer tests — Whitespace, UUID, timestamp, chained normalizers
Error path tests — Missing snapshot, nonexistent file, parse failures

💡 Usage Example

Record a golden baseline

use mofa_testing::{AgentTest, GoldenStore, GoldenTestConfig, run_golden_test};

let scenario = AgentTest::new("my_agent")
    .when_user_says("Hello")
    .then_agent_should()
    .respond_containing("Hi")
    .build()?;

// Update mode: save actual outputs as golden baseline
let config = GoldenTestConfig::update(GoldenStore::new("./goldens"));
let report = run_golden_test(&config, &scenario, &mut agent).await;

Validate against golden

// Strict mode: compare against stored golden
let config = GoldenTestConfig::strict(GoldenStore::new("./goldens"));
let report = run_golden_test(&config, &scenario, &mut agent).await;

assert_eq!(report.failed(), 0, "golden regression detected");

With normalizers (ignore UUIDs/timestamps)

use mofa_testing::NormalizerChain;

let config = GoldenTestConfig::strict(GoldenStore::new("./goldens"))
    .with_normalizer(NormalizerChain::default_chain()?);

CI Workflow

# CI: strict mode catches regressions
cargo test --test golden_tests

# Local: update baselines when behavior intentionally changes
GOLDEN_MODE=update cargo test --test golden_tests

✅ Checklist

Implements parameterized scenario expansion for the mofa-testing crate. One scenario template can now expand into many concrete test cases by substituting {{variable}} placeholders with values from parameter sets. New components: - ParameterSet: named variable bindings for one test variant - ParameterMatrix: Cartesian product expansion with safety limits - ParameterizedScenario: template + parameter sets -> expanded scenarios - ParameterizedScenarioFile: YAML/TOML/JSON file-backed loading Includes: - 30+ comprehensive tests covering expansion, substitution, file loading, execution, edge cases, and error handling - Example scenarios in examples/parameterized_test/ with YAML, TOML, and matrix expansion demonstrations - README with usage guide and code samples Closes #<ISSUE_NUMBER>

Implements golden response (snapshot) testing for the mofa-testing crate. Agent outputs are recorded as baselines, then future runs are compared against them to detect regressions automatically. New components: - GoldenSnapshot: serializable record of turn outputs (JSON/YAML) - GoldenStore: filesystem-backed snapshot persistence - GoldenTestConfig: strict (validate) vs. update (record) modes - GoldenDiff: structured per-field diff reporting - Normalizer trait + WhitespaceNormalizer, RegexNormalizer, NormalizerChain - run_golden_test: end-to-end golden test runner integrated with TestReport - compare_golden: standalone comparison engine Includes: - 30+ comprehensive tests covering serialization, store operations, diff detection, normalizers, update/strict mode, multi-turn, and tool call verification - Example golden snapshots and README in examples/golden_response_test/ - tempfile added as dev-dependency for test isolation Closes #<ISSUE_NUMBER>

Mustafa11300 added 3 commits April 9, 2026 20:19

feat(testing): add agent test DSL and scenario loader

df3f575

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(testing): Add Golden Response System for Snapshot Testing#1607

feat(testing): Add Golden Response System for Snapshot Testing#1607
Mustafa11300 wants to merge 3 commits intomofa-org:mainfrom
Mustafa11300:feat/issue4-golden-response-system

Mustafa11300 commented Apr 9, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Mustafa11300 commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔍 Description

📌 Changes

New: tests/src/golden.rs

Modified: tests/src/lib.rs

Modified: tests/Cargo.toml

New: tests/tests/golden_tests.rs

New: examples/golden_response_test/

🧪 Testing

💡 Usage Example

Record a golden baseline

Validate against golden

With normalizers (ignore UUIDs/timestamps)

CI Workflow

✅ Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Mustafa11300 commented Apr 9, 2026 •

edited

Loading

New: `tests/src/golden.rs`

Modified: `tests/src/lib.rs`

Modified: `tests/Cargo.toml`

New: `tests/tests/golden_tests.rs`

New: `examples/golden_response_test/`