You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
RecordingProvider + MockProvider today capture only LLM provider calls. Inter-agent workflows execute as sequences of LLM calls, so single-process multi-agent replay is deterministic transitively. Cross-process or explicit inter-agent event recording is not supported.
Concrete gaps today:
No structured event log. The recording file shape is keyed by step_id/prompt_hash; cross-process causality is not encoded.
No reproducibility guarantee. Two replays of the same recording can diverge silently if any non-determinism leaks (event ordering, dict iteration, tool side effects).
No verifier. Nothing in the toolchain asserts "this recording is reproducible".
This blocks two practical use cases:
Regression tests for multi-agent workflows that need bit-for-bit reproducibility across CI runs.
Post-incident replay where the goal is to reproduce a faulty multi-agent interaction exactly, not just hit similar prompts.
Proposal
1. JSONL format with a published JSON Schema covering six event types: task_sent, task_received, response_sent, tool_call, tool_return, state_transition. Each event carries:
event_id (UUID v4)
agent_id (string)
timestamp_ns (monotonic per-process; informational only, not used for ordering)
parent_event_id (UUID v4 or null) — encodes dep-graph causality
2. Deterministic replay by dependency-graph ordering, not by timestamp (timestamps are not monotonic across processes). parent_event_id encodes causal ordering; replay re-executes in topological order. Nodes with no dependency are sorted by event_id for stable tie-breaking.
3. CLI agentloom verify-determinism <recording>: runs the recording through MockProvider replay and asserts byte-for-byte identical output across two independent runs. Exit code non-zero on any divergence; diff printed to stderr.
4. Extend RecordingProvider with observer hook for inter-agent events (opt-in, activated when the runtime emits multi-agent steps). Existing single-LLM recordings keep their format unchanged — multi-agent recordings nest under a _inter_agent_events key.
docs/record-replay-spec.md — format specification + dependency-graph algorithm prose.
agentloom.contracts.experimental — re-export the event types under the experimental tier.
Regression tests
test_jsonl_schema_validates — known-good and known-bad event payloads.
test_dep_graph_ordering_deterministic — same DAG, different traversal order at write time → identical replay output.
test_verify_cli_exits_zero_on_identical_runs.
test_verify_cli_exits_nonzero_on_divergence — inject a non-determinism (e.g. random.random() in a tool) and assert exit code ≠ 0 + diff printed.
test_inter_agent_events_extend_recordings_without_breaking_v2_format — existing single-LLM RecordingProvider recordings load and replay unchanged.
Notes
Delta vs OSS observability tools (LangSmith, Phoenix, Langfuse): they trace multi-agent runs but none specify a deterministic-replay contract with an executable verifier.
Algorithm choice (dep-graph ordering vs vector clocks vs Lamport timestamps): dep-graph is the simplest deterministic ordering compatible with AgentLoom's existing DAG layer model. Vector clocks would scale to N processes but require coordinated state per agent; Lamport timestamps are not unique. Dep-graph also matches how the engine already enumerates parallel layers, so the writer side adds no new ordering machinery.
agentloom.contracts.experimental is the right tier for the event schemas — they will iterate during initial adoption and graduate to .stable once external consumers settle on a stable shape.
Description
RecordingProvider+MockProvidertoday capture only LLM provider calls. Inter-agent workflows execute as sequences of LLM calls, so single-process multi-agent replay is deterministic transitively. Cross-process or explicit inter-agent event recording is not supported.Concrete gaps today:
step_id/prompt_hash; cross-process causality is not encoded.This blocks two practical use cases:
Proposal
1. JSONL format with a published JSON Schema covering six event types:
task_sent,task_received,response_sent,tool_call,tool_return,state_transition. Each event carries:event_id(UUID v4)agent_id(string)timestamp_ns(monotonic per-process; informational only, not used for ordering)parent_event_id(UUID v4 or null) — encodes dep-graph causalitypayload_hash(SHA-256, 16-char prefix)task_sentcarriesrecipient_agent_id,payload_ref)2. Deterministic replay by dependency-graph ordering, not by timestamp (timestamps are not monotonic across processes).
parent_event_idencodes causal ordering; replay re-executes in topological order. Nodes with no dependency are sorted byevent_idfor stable tie-breaking.3. CLI
agentloom verify-determinism <recording>: runs the recording throughMockProviderreplay and asserts byte-for-byte identical output across two independent runs. Exit code non-zero on any divergence; diff printed to stderr.4. Extend
RecordingProviderwith observer hook for inter-agent events (opt-in, activated when the runtime emits multi-agent steps). Existing single-LLM recordings keep their format unchanged — multi-agent recordings nest under a_inter_agent_eventskey.Scope
src/agentloom/record_replay/inter_agent.py— event types + JSONL writer.src/agentloom/record_replay/schema.json— committed JSON Schema (queryable atagentloom.record_replay.SCHEMA_URI).src/agentloom/record_replay/verify.py— replay verifier (topological sort + diff).src/agentloom/cli/verify_determinism.py— CLI entry point.docs/record-replay-spec.md— format specification + dependency-graph algorithm prose.agentloom.contracts.experimental— re-export the event types under the experimental tier.Regression tests
test_jsonl_schema_validates— known-good and known-bad event payloads.test_dep_graph_ordering_deterministic— same DAG, different traversal order at write time → identical replay output.test_verify_cli_exits_zero_on_identical_runs.test_verify_cli_exits_nonzero_on_divergence— inject a non-determinism (e.g.random.random()in a tool) and assert exit code ≠ 0 + diff printed.test_inter_agent_events_extend_recordings_without_breaking_v2_format— existing single-LLMRecordingProviderrecordings load and replay unchanged.Notes
agentloom.contracts.experimentalis the right tier for the event schemas — they will iterate during initial adoption and graduate to.stableonce external consumers settle on a stable shape.