Skip to content

add deterministic inter-agent message record/replay with JSONL spec and verifier CLI #135

@cchinchilla-dev

Description

@cchinchilla-dev

Description

RecordingProvider + MockProvider today capture only LLM provider calls. Inter-agent workflows execute as sequences of LLM calls, so single-process multi-agent replay is deterministic transitively. Cross-process or explicit inter-agent event recording is not supported.

Concrete gaps today:

  • No structured event log. The recording file shape is keyed by step_id/prompt_hash; cross-process causality is not encoded.
  • No reproducibility guarantee. Two replays of the same recording can diverge silently if any non-determinism leaks (event ordering, dict iteration, tool side effects).
  • No verifier. Nothing in the toolchain asserts "this recording is reproducible".

This blocks two practical use cases:

  • Regression tests for multi-agent workflows that need bit-for-bit reproducibility across CI runs.
  • Post-incident replay where the goal is to reproduce a faulty multi-agent interaction exactly, not just hit similar prompts.

Proposal

1. JSONL format with a published JSON Schema covering six event types: task_sent, task_received, response_sent, tool_call, tool_return, state_transition. Each event carries:

  • event_id (UUID v4)
  • agent_id (string)
  • timestamp_ns (monotonic per-process; informational only, not used for ordering)
  • parent_event_id (UUID v4 or null) — encodes dep-graph causality
  • payload_hash (SHA-256, 16-char prefix)
  • type-specific fields (e.g. task_sent carries recipient_agent_id, payload_ref)

2. Deterministic replay by dependency-graph ordering, not by timestamp (timestamps are not monotonic across processes). parent_event_id encodes causal ordering; replay re-executes in topological order. Nodes with no dependency are sorted by event_id for stable tie-breaking.

3. CLI agentloom verify-determinism <recording>: runs the recording through MockProvider replay and asserts byte-for-byte identical output across two independent runs. Exit code non-zero on any divergence; diff printed to stderr.

4. Extend RecordingProvider with observer hook for inter-agent events (opt-in, activated when the runtime emits multi-agent steps). Existing single-LLM recordings keep their format unchanged — multi-agent recordings nest under a _inter_agent_events key.

Scope

  • src/agentloom/record_replay/inter_agent.py — event types + JSONL writer.
  • src/agentloom/record_replay/schema.json — committed JSON Schema (queryable at agentloom.record_replay.SCHEMA_URI).
  • src/agentloom/record_replay/verify.py — replay verifier (topological sort + diff).
  • src/agentloom/cli/verify_determinism.py — CLI entry point.
  • docs/record-replay-spec.md — format specification + dependency-graph algorithm prose.
  • agentloom.contracts.experimental — re-export the event types under the experimental tier.

Regression tests

  • test_jsonl_schema_validates — known-good and known-bad event payloads.
  • test_dep_graph_ordering_deterministic — same DAG, different traversal order at write time → identical replay output.
  • test_verify_cli_exits_zero_on_identical_runs.
  • test_verify_cli_exits_nonzero_on_divergence — inject a non-determinism (e.g. random.random() in a tool) and assert exit code ≠ 0 + diff printed.
  • test_inter_agent_events_extend_recordings_without_breaking_v2_format — existing single-LLM RecordingProvider recordings load and replay unchanged.

Notes

Metadata

Metadata

Assignees

No one assigned

    Labels

    coreCore engine, DAG, stateenhancementNew feature or requestprovidersProvider gateway and adapterstests

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions