add CLI


### Description

The CLI today has `validate` (partial), `run` (stub), `version` (stub). 0.2.0 ships a functional surface that wires all 0.2.0 modules together.

Four commands:

1. `agentanvil validate <contract.yaml>` — runs static analyzer (#011), exits non-zero on fatal.
2. `agentanvil run --contract X.yaml --agent-path ./agent/ [--backend {direct,agentloom,mock}] [--runner {subprocess,docker}] [--record PATH] [--seed N] [--out-dir PATH]` — end-to-end: analyze agent, generate scenarios, run, evaluate, report.
3. `agentanvil replay --recording PATH [--contract X.yaml] [--out-dir PATH]` — reproduce a recorded run with `MockBackend`.
4. `agentanvil version` — prints AgentAnvil version, git commit hash, installed optional extras.

### Proposal

**1. `run` command skeleton:**

```python
# src/agentanvil/cli/main.py
@app.command()
def run(
    contract: Path,
    agent_path: Path,
    backend: str = "direct",
    runner: str = "subprocess",
    record: Path | None = None,
    seed: int = 42,
    out_dir: Path = Path("./agentanvil-results"),
    max_scenarios: int = 10,
):
    # 1. Load + validate contract.
    c = AgentContract.from_yaml(contract.read_text())
    diagnostics = analyze(c)
    if diagnostics.has_fatal:
        _print_diagnostics(diagnostics)
        raise typer.Exit(1)

    # 2. Analyze agent.
    profile = analyze_agent(agent_path)

    # 3. Generate scenarios.
    scenarios = generate(c, config=GeneratorConfig(seed=seed, max_per_category=max_scenarios // 3))

    # 4. Resolve backend + runner.
    be = _resolve_backend(backend, record=record)
    rn = _resolve_runner(runner)

    # 5. Execute each scenario.
    records = []
    for scenario in scenarios:
        result = await rn.run(agent_path=agent_path, scenario_json=scenario.model_dump_json(), timeout_ms=c.constraints.max_latency_ms or 60000)
        trace = _parse_trace(result)
        score = _evaluate(c, trace, be)
        records.append(RunRecord(...))

    # 6. Write reports.
    out_dir.mkdir(parents=True, exist_ok=True)
    for rec in records:
        (out_dir / f"{rec.run_id}.json").write_text(render(rec, ReportFormat.JSON))
        (out_dir / f"{rec.run_id}.html").write_text(render(rec, ReportFormat.HTML))
```

**2. `replay` command:**

```python
@app.command()
def replay(
    recording: Path,
    contract: Path | None = None,
    out_dir: Path = Path("./agentanvil-replay"),
):
    # 1. Load recording.
    rec_env = Recording.from_file(recording)
    # 2. Build MockBackend.
    mock = MockBackend(recording)
    # 3. Replay every entry through the same pipeline.
    # 4. Assert byte-for-byte equality with expected (if present).
```

**3. `version` command:**

```python
@app.command()
def version():
    import importlib.metadata as md
    print(f"agentanvil {md.version('agentanvil')}")
    print(f"python {sys.version.split()[0]}")
    for extra in ("agentloom", "docker", "viz", "stats", "replication", "security", "cicd"):
        try:
            md.distribution(_extra_pkg(extra))
            print(f"  [{extra}] installed")
        except md.PackageNotFoundError:
            pass
```

### Scope

- `src/agentanvil/cli/main.py` — fill out `run` and `replay`, improve `version`.
- `src/agentanvil/cli/_helpers.py` — new, backend/runner resolution.
- `tests/cli/test_run.py` — end-to-end with MockBackend.
- `tests/cli/test_replay.py`
- `tests/cli/test_validate.py`
- `tests/cli/test_version.py`

### Regression tests

- `test_cli_validate_on_fixture_contract_exits_zero`
- `test_cli_validate_on_contradictory_contract_exits_nonzero`
- `test_cli_run_produces_json_and_html_reports`
- `test_cli_run_respects_seed_for_scenario_generation`
- `test_cli_run_honours_timeout_from_contract`
- `test_cli_replay_byte_for_byte_identical_to_record`
- `test_cli_replay_fails_clearly_on_missing_recording_entry`
- `test_cli_version_prints_installed_extras`

### Notes

- CLI uses Typer + Rich (already core deps).
- `--backend mock` accepts `--recording PATH` implicitly via the `replay` path.
- Depends on: #011, #012, #013, #014, #015, #016, #017, #018, #019 — all 0.2.0 modules converge here.
- Blocks: #021 (quickstart), #024 (first case study uses `agentanvil run`).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add CLI #13

Description

Proposal

Scope

Regression tests

Notes

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

add CLI #13

Description

Description

Proposal

Scope

Regression tests

Notes

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions