Skip to content

add reporter with JSON, HTML, Markdown outputs #12

@cchinchilla-dev

Description

@cchinchilla-dev

Description

A RunRecord (#5) is the canonical source of a run's result; a Report is its renderable form. Users consume results in three contexts:

  1. Programmatic (CI, tooling) — JSON.
  2. Human review — single-file HTML with embedded SVG.
  3. PR / issue paste — Markdown.

All three formats derive from the same RunRecord; they are byte-for-byte deterministic on identical input (with sorted JSON keys). SARIF output lands in 0.4.0 #068.

Proposal

1. Module layout:

src/agentanvil/reporter/
├── __init__.py        # exports render(record, format) -> str
├── json_reporter.py
├── html_reporter.py
├── markdown_reporter.py
└── templates/
    ├── report.html.j2      # Jinja2 single-file
    └── report.md.j2

2. Unified API:

# src/agentanvil/reporter/__init__.py
from agentanvil.core.report import ReportFormat
from agentanvil.core.run_record import RunRecord


def render(record: RunRecord, fmt: ReportFormat) -> str:
    match fmt:
        case ReportFormat.JSON:
            return json_reporter.render(record)
        case ReportFormat.HTML:
            return html_reporter.render(record)
        case ReportFormat.MARKDOWN:
            return markdown_reporter.render(record)
        case ReportFormat.SARIF:
            raise NotImplementedError("SARIF lands in 0.4.0 #068")

3. Determinism invariant:

def test_json_reporter_deterministic():
    r1 = render(fixture_record, ReportFormat.JSON)
    r2 = render(fixture_record, ReportFormat.JSON)
    assert r1 == r2

def test_html_reporter_deterministic_modulo_timestamp():
    # timestamps are rendered from record.metadata.timestamp_iso, not time.time()
    r1 = render(fixture_record, ReportFormat.HTML)
    r2 = render(fixture_record, ReportFormat.HTML)
    assert r1 == r2

4. HTML report structure (Jinja2 template; single file, inline CSS, embedded SVG):

  • Header: contract name, version, hash.
  • Metadata: timestamp, backend, runner, container digest, Python version.
  • Score breakdown: objective metrics, judge verdicts (per policy / per task), violations.
  • Trace table: per-step type, duration, cost, tokens.
  • Placeholder for ConversationGraph SVG (rendered in 0.3.0 #044).
  • Replay instructions: command line to reproduce.

5. Reporter never collapses ScoreBreakdown into a float — enforced by invariant test. If caller wants a single headline number, the CLI can expose a --summary flag that picks one metric; the report object never does.

Scope

  • src/agentanvil/reporter/__init__.py — new.
  • src/agentanvil/reporter/json_reporter.py — new, sorted-keys JSON.
  • src/agentanvil/reporter/html_reporter.py — new.
  • src/agentanvil/reporter/markdown_reporter.py — new.
  • src/agentanvil/reporter/templates/report.html.j2 — new.
  • src/agentanvil/reporter/templates/report.md.j2 — new.
  • tests/reporter/test_json_reporter.py
  • tests/reporter/test_html_reporter.py
  • tests/reporter/test_markdown_reporter.py
  • tests/reporter/test_determinism.py
  • tests/reporter/fixtures/records/ — canonical test records.

Regression tests

  • test_json_reporter_deterministic_on_same_input
  • test_json_reporter_sorted_keys
  • test_json_reporter_preserves_score_breakdown_structure (no float collapse)
  • test_html_reporter_single_file_inlines_css
  • test_html_reporter_includes_contract_hash_and_container_digest
  • test_markdown_reporter_headers_render_correctly
  • test_markdown_reporter_fits_in_github_pr_comment_limit (65k chars)
  • test_reporter_invariant_no_top_level_float_in_score_breakdown

Notes

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestreporterReport rendering (JSON, HTML, Markdown, SARIF)

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions