add first Tier A single-agent case study under examples/case-studies/


### Description

By 0.2.0 close, AgentAnvil must have at least one end-to-end case study committed under `examples/case-studies/` — an actual OSS agent from the Tier A pool, wired through AgentAnvil with a contract, scenarios, recording, and report. This is both a dogfooding signal and a template for the ≥ 15 case studies that 0.4.0 executes.

Tier A criteria:

- OSS with public traction (GitHub stars ≥ 100).
- ≥ 12 months of history.
- Active issues.
- Commit activity recent.
- Single-agent for 0.2.0 (multi-agent lands in 0.3.0).

Candidates (pick one):

- LangChain SQL agent (common, small, well-documented).
- LangChain ReAct + web search.
- Open Interpreter (single-agent).
- Raw-Python agent for diversity (if no OSS single-agent fits the 12-month criterion cleanly).

### Proposal

**1. Case study directory structure:**

```
examples/case-studies/
└── tier-a-01-<agent-name>/
    ├── README.md                # rationale, setup, notable findings
    ├── contract.yaml
    ├── Dockerfile
    ├── requirements.txt
    ├── scenarios.yaml           # hand-crafted + generated mix
    ├── recordings/
    │   └── <agent>.json
    ├── expected/
    │   └── <agent>.report.json
    └── run.sh                   # reproducibility one-liner
```

**2. `README.md` template:**

```markdown
# Tier A Case Study 01: <agent-name>

## Target
- Repo: <URL>
- Commit pinned: `abc123...`
- Stars at time of study: <N>
- Framework: LangChain / raw / etc.
- Domain: SWE / QA / etc.

## Contract
<describes the 2-3 policies, 1-2 tasks, key constraints>

## Findings
<2-3 bullet points on what the run revealed. Objective-only is fine in 0.2.0.>

## Reproducibility
bash run.sh
```

**3. `run.sh`:**

```bash
#!/usr/bin/env bash
set -euo pipefail
pip install -r requirements.txt
agentanvil replay \
    --recording recordings/<agent>.json \
    --contract contract.yaml \
    --out-dir ./output
diff -q output/*.json expected/*.json
```

**4. Smoke test in CI:**

```yaml
# .github/workflows/ci.yml
case-studies-smoke:
  runs-on: ubuntu-latest
  steps:
    - uses: actions/checkout@v4
    - uses: actions/setup-python@v5
    - run: pip install -e .
    - run: |
        for cs in examples/case-studies/*/; do
          bash "$cs/run.sh"
        done
```

Any new case study added later must also pass its `run.sh` (gate for merge).

### Scope

- `examples/case-studies/tier-a-01-<name>/` — full directory.
- `.github/workflows/ci.yml` — `case-studies-smoke` job.
- `docs/case-studies/tier-a-01.md` — brief summary in docs site (or link to README.md).

### Regression tests

- `case-studies-smoke` CI job green on every PR.
- `test_tier_a_01_contract_validates`
- `test_tier_a_01_replay_matches_expected`
- `test_tier_a_01_readme_includes_required_fields` (rationale, commit hash, findings)

### Notes

- Tier A candidate selection is a research decision; list 3 finalists, pick one. The others stay on the 0.4.0 list.
- COI: if the candidate is by any contributor the author knows personally, note it in the README `Caveats` section.
- Depends on: all of 0.2.0.
- Blocks: nothing in 0.2.0 — but sets the template for #071 (≥ 15 case studies in 0.4.0).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add first Tier A single-agent case study under examples/case-studies/ #17

Description

Proposal

Scope

Regression tests

Notes

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

add first Tier A single-agent case study under examples/case-studies/ #17

Description

Description

Proposal

Scope

Regression tests

Notes

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions