Skip to content

add CLI #13

@cchinchilla-dev

Description

@cchinchilla-dev

Description

The CLI today has validate (partial), run (stub), version (stub). 0.2.0 ships a functional surface that wires all 0.2.0 modules together.

Four commands:

  1. agentanvil validate <contract.yaml> — runs static analyzer (add evaluator Layer 2 (single LLM-as-judge) with structured output #11), exits non-zero on fatal.
  2. agentanvil run --contract X.yaml --agent-path ./agent/ [--backend {direct,agentloom,mock}] [--runner {subprocess,docker}] [--record PATH] [--seed N] [--out-dir PATH] — end-to-end: analyze agent, generate scenarios, run, evaluate, report.
  3. agentanvil replay --recording PATH [--contract X.yaml] [--out-dir PATH] — reproduce a recorded run with MockBackend.
  4. agentanvil version — prints AgentAnvil version, git commit hash, installed optional extras.

Proposal

1. run command skeleton:

# src/agentanvil/cli/main.py
@app.command()
def run(
    contract: Path,
    agent_path: Path,
    backend: str = "direct",
    runner: str = "subprocess",
    record: Path | None = None,
    seed: int = 42,
    out_dir: Path = Path("./agentanvil-results"),
    max_scenarios: int = 10,
):
    # 1. Load + validate contract.
    c = AgentContract.from_yaml(contract.read_text())
    diagnostics = analyze(c)
    if diagnostics.has_fatal:
        _print_diagnostics(diagnostics)
        raise typer.Exit(1)

    # 2. Analyze agent.
    profile = analyze_agent(agent_path)

    # 3. Generate scenarios.
    scenarios = generate(c, config=GeneratorConfig(seed=seed, max_per_category=max_scenarios // 3))

    # 4. Resolve backend + runner.
    be = _resolve_backend(backend, record=record)
    rn = _resolve_runner(runner)

    # 5. Execute each scenario.
    records = []
    for scenario in scenarios:
        result = await rn.run(agent_path=agent_path, scenario_json=scenario.model_dump_json(), timeout_ms=c.constraints.max_latency_ms or 60000)
        trace = _parse_trace(result)
        score = _evaluate(c, trace, be)
        records.append(RunRecord(...))

    # 6. Write reports.
    out_dir.mkdir(parents=True, exist_ok=True)
    for rec in records:
        (out_dir / f"{rec.run_id}.json").write_text(render(rec, ReportFormat.JSON))
        (out_dir / f"{rec.run_id}.html").write_text(render(rec, ReportFormat.HTML))

2. replay command:

@app.command()
def replay(
    recording: Path,
    contract: Path | None = None,
    out_dir: Path = Path("./agentanvil-replay"),
):
    # 1. Load recording.
    rec_env = Recording.from_file(recording)
    # 2. Build MockBackend.
    mock = MockBackend(recording)
    # 3. Replay every entry through the same pipeline.
    # 4. Assert byte-for-byte equality with expected (if present).

3. version command:

@app.command()
def version():
    import importlib.metadata as md
    print(f"agentanvil {md.version('agentanvil')}")
    print(f"python {sys.version.split()[0]}")
    for extra in ("agentloom", "docker", "viz", "stats", "replication", "security", "cicd"):
        try:
            md.distribution(_extra_pkg(extra))
            print(f"  [{extra}] installed")
        except md.PackageNotFoundError:
            pass

Scope

  • src/agentanvil/cli/main.py — fill out run and replay, improve version.
  • src/agentanvil/cli/_helpers.py — new, backend/runner resolution.
  • tests/cli/test_run.py — end-to-end with MockBackend.
  • tests/cli/test_replay.py
  • tests/cli/test_validate.py
  • tests/cli/test_version.py

Regression tests

  • test_cli_validate_on_fixture_contract_exits_zero
  • test_cli_validate_on_contradictory_contract_exits_nonzero
  • test_cli_run_produces_json_and_html_reports
  • test_cli_run_respects_seed_for_scenario_generation
  • test_cli_run_honours_timeout_from_contract
  • test_cli_replay_byte_for_byte_identical_to_record
  • test_cli_replay_fails_clearly_on_missing_recording_entry
  • test_cli_version_prints_installed_extras

Notes

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions