Description
A RunRecord (#5) is the canonical source of a run's result; a Report is its renderable form. Users consume results in three contexts:
- Programmatic (CI, tooling) — JSON.
- Human review — single-file HTML with embedded SVG.
- PR / issue paste — Markdown.
All three formats derive from the same RunRecord; they are byte-for-byte deterministic on identical input (with sorted JSON keys). SARIF output lands in 0.4.0 #068.
Proposal
1. Module layout:
src/agentanvil/reporter/
├── __init__.py # exports render(record, format) -> str
├── json_reporter.py
├── html_reporter.py
├── markdown_reporter.py
└── templates/
├── report.html.j2 # Jinja2 single-file
└── report.md.j2
2. Unified API:
# src/agentanvil/reporter/__init__.py
from agentanvil.core.report import ReportFormat
from agentanvil.core.run_record import RunRecord
def render(record: RunRecord, fmt: ReportFormat) -> str:
match fmt:
case ReportFormat.JSON:
return json_reporter.render(record)
case ReportFormat.HTML:
return html_reporter.render(record)
case ReportFormat.MARKDOWN:
return markdown_reporter.render(record)
case ReportFormat.SARIF:
raise NotImplementedError("SARIF lands in 0.4.0 #068")
3. Determinism invariant:
def test_json_reporter_deterministic():
r1 = render(fixture_record, ReportFormat.JSON)
r2 = render(fixture_record, ReportFormat.JSON)
assert r1 == r2
def test_html_reporter_deterministic_modulo_timestamp():
# timestamps are rendered from record.metadata.timestamp_iso, not time.time()
r1 = render(fixture_record, ReportFormat.HTML)
r2 = render(fixture_record, ReportFormat.HTML)
assert r1 == r2
4. HTML report structure (Jinja2 template; single file, inline CSS, embedded SVG):
- Header: contract name, version, hash.
- Metadata: timestamp, backend, runner, container digest, Python version.
- Score breakdown: objective metrics, judge verdicts (per policy / per task), violations.
- Trace table: per-step type, duration, cost, tokens.
- Placeholder for ConversationGraph SVG (rendered in 0.3.0 #044).
- Replay instructions: command line to reproduce.
5. Reporter never collapses ScoreBreakdown into a float — enforced by invariant test. If caller wants a single headline number, the CLI can expose a --summary flag that picks one metric; the report object never does.
Scope
src/agentanvil/reporter/__init__.py — new.
src/agentanvil/reporter/json_reporter.py — new, sorted-keys JSON.
src/agentanvil/reporter/html_reporter.py — new.
src/agentanvil/reporter/markdown_reporter.py — new.
src/agentanvil/reporter/templates/report.html.j2 — new.
src/agentanvil/reporter/templates/report.md.j2 — new.
tests/reporter/test_json_reporter.py
tests/reporter/test_html_reporter.py
tests/reporter/test_markdown_reporter.py
tests/reporter/test_determinism.py
tests/reporter/fixtures/records/ — canonical test records.
Regression tests
test_json_reporter_deterministic_on_same_input
test_json_reporter_sorted_keys
test_json_reporter_preserves_score_breakdown_structure (no float collapse)
test_html_reporter_single_file_inlines_css
test_html_reporter_includes_contract_hash_and_container_digest
test_markdown_reporter_headers_render_correctly
test_markdown_reporter_fits_in_github_pr_comment_limit (65k chars)
test_reporter_invariant_no_top_level_float_in_score_breakdown
Notes
Description
A
RunRecord(#5) is the canonical source of a run's result; aReportis its renderable form. Users consume results in three contexts:All three formats derive from the same
RunRecord; they are byte-for-byte deterministic on identical input (with sorted JSON keys). SARIF output lands in 0.4.0 #068.Proposal
1. Module layout:
2. Unified API:
3. Determinism invariant:
4. HTML report structure (Jinja2 template; single file, inline CSS, embedded SVG):
5. Reporter never collapses
ScoreBreakdowninto a float — enforced by invariant test. If caller wants a single headline number, the CLI can expose a--summaryflag that picks one metric; the report object never does.Scope
src/agentanvil/reporter/__init__.py— new.src/agentanvil/reporter/json_reporter.py— new, sorted-keys JSON.src/agentanvil/reporter/html_reporter.py— new.src/agentanvil/reporter/markdown_reporter.py— new.src/agentanvil/reporter/templates/report.html.j2— new.src/agentanvil/reporter/templates/report.md.j2— new.tests/reporter/test_json_reporter.pytests/reporter/test_html_reporter.pytests/reporter/test_markdown_reporter.pytests/reporter/test_determinism.pytests/reporter/fixtures/records/— canonical test records.Regression tests
test_json_reporter_deterministic_on_same_inputtest_json_reporter_sorted_keystest_json_reporter_preserves_score_breakdown_structure(no float collapse)test_html_reporter_single_file_inlines_csstest_html_reporter_includes_contract_hash_and_container_digesttest_markdown_reporter_headers_render_correctlytest_markdown_reporter_fits_in_github_pr_comment_limit(65k chars)test_reporter_invariant_no_top_level_float_in_score_breakdownNotes