Skip to content

add first Tier A single-agent case study under examples/case-studies/ #17

@cchinchilla-dev

Description

@cchinchilla-dev

Description

By 0.2.0 close, AgentAnvil must have at least one end-to-end case study committed under examples/case-studies/ — an actual OSS agent from the Tier A pool, wired through AgentAnvil with a contract, scenarios, recording, and report. This is both a dogfooding signal and a template for the ≥ 15 case studies that 0.4.0 executes.

Tier A criteria:

  • OSS with public traction (GitHub stars ≥ 100).
  • ≥ 12 months of history.
  • Active issues.
  • Commit activity recent.
  • Single-agent for 0.2.0 (multi-agent lands in 0.3.0).

Candidates (pick one):

  • LangChain SQL agent (common, small, well-documented).
  • LangChain ReAct + web search.
  • Open Interpreter (single-agent).
  • Raw-Python agent for diversity (if no OSS single-agent fits the 12-month criterion cleanly).

Proposal

1. Case study directory structure:

examples/case-studies/
└── tier-a-01-<agent-name>/
    ├── README.md                # rationale, setup, notable findings
    ├── contract.yaml
    ├── Dockerfile
    ├── requirements.txt
    ├── scenarios.yaml           # hand-crafted + generated mix
    ├── recordings/
    │   └── <agent>.json
    ├── expected/
    │   └── <agent>.report.json
    └── run.sh                   # reproducibility one-liner

2. README.md template:

# Tier A Case Study 01: <agent-name>

## Target
- Repo: <URL>
- Commit pinned: `abc123...`
- Stars at time of study: <N>
- Framework: LangChain / raw / etc.
- Domain: SWE / QA / etc.

## Contract
<describes the 2-3 policies, 1-2 tasks, key constraints>

## Findings
<2-3 bullet points on what the run revealed. Objective-only is fine in 0.2.0.>

## Reproducibility
bash run.sh

3. run.sh:

#!/usr/bin/env bash
set -euo pipefail
pip install -r requirements.txt
agentanvil replay \
    --recording recordings/<agent>.json \
    --contract contract.yaml \
    --out-dir ./output
diff -q output/*.json expected/*.json

4. Smoke test in CI:

# .github/workflows/ci.yml
case-studies-smoke:
  runs-on: ubuntu-latest
  steps:
    - uses: actions/checkout@v4
    - uses: actions/setup-python@v5
    - run: pip install -e .
    - run: |
        for cs in examples/case-studies/*/; do
          bash "$cs/run.sh"
        done

Any new case study added later must also pass its run.sh (gate for merge).

Scope

  • examples/case-studies/tier-a-01-<name>/ — full directory.
  • .github/workflows/ci.ymlcase-studies-smoke job.
  • docs/case-studies/tier-a-01.md — brief summary in docs site (or link to README.md).

Regression tests

  • case-studies-smoke CI job green on every PR.
  • test_tier_a_01_contract_validates
  • test_tier_a_01_replay_matches_expected
  • test_tier_a_01_readme_includes_required_fields (rationale, commit hash, findings)

Notes

  • Tier A candidate selection is a research decision; list 3 finalists, pick one. The others stay on the 0.4.0 list.
  • COI: if the candidate is by any contributor the author knows personally, note it in the README Caveats section.
  • Depends on: all of 0.2.0.
  • Blocks: nothing in 0.2.0 — but sets the template for #071 (≥ 15 case studies in 0.4.0).

Metadata

Metadata

Assignees

No one assigned

    Labels

    case-studyCase studies under examples/case-studies/documentationImprovements or additions to documentationenhancementNew feature or request

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions