This example demonstrates an external batch runner pattern for a (synthetic) AML screening use-case.
-
Ground truth:
evals/dataset.eval.yamlcontains tests withinput(structured object content) andexpected_output(e.g.,content.decision). -
CSV conversion:
batch-cli-runner.tsimports functions frombuild-csv-from-eval.tsto convertinputinto CSV format. The CSV contains only inputs (customer data, transaction details) - no expected decisions. -
Batch processing:
batch-cli-runner.tsreads the CSV and applies synthetic AML screening rules, writing actual responses as JSONL to a temporary file. Each JSONL record includesoutputwithtool_callsfor trace extraction. -
Evaluation: AgentV compares the actual JSONL output against the ground truth in
evals/dataset.eval.yamlusing evaluators likecode_graderandtool_trajectory.
This example intentionally includes a test (aml-004-not-exist) that is not written into the CSV input by scripts/build-csv-from-eval.ts.
That means the batch runner never emits a JSONL record for that test_id, and the CLI provider surfaces a provider-side error:
error: "Batch output missing id 'aml-004-not-exist'"
AgentV then reports that test as failed (with error populated), while still evaluating the other items in the batch.
The batch runner outputs JSONL records with output containing tool_calls:
{
"id": "aml-001",
"text": "{...}",
"output": [
{
"role": "assistant",
"tool_calls": [
{
"tool": "aml_screening",
"input": { "origin_country": "NZ", ... },
"output": { "decision": "CLEAR", ... }
}
]
}
]
}The tool_trajectory evaluator extracts tool calls directly from output[].tool_calls[]. This is the primary format - no separate trace field is required.
batch-cli-demo.yaml— Ground truth: tests with inputs and expected outputsscripts/build-csv-from-eval.ts— Utilities to convert YAML tests to CSV format (imported by batch-cli-runner.ts)scripts/batch-cli-runner.ts— Main batch runner: converts inputs to CSV, processes them, writes actual responses as JSONL.agentv/targets.yaml— Defines thebatch_cliCLI target with provider batching enabled
From the repo root:
cd examples/features/batch-cli
# Run AgentV against the batch CLI target
# NOTE: This requires the CLI provider to support batching + JSONL batch output.
bun agentv eval ./evals/dataset.eval.yaml --target batch_cli