flesh out DirectBackend with streaming, bounded retries, rate-limit handling, pricing tables


### Description

`DirectBackend` skeleton from 0.1.x #003 covers the happy path for OpenAI / Anthropic / Google. Three gaps remain before 0.2.0 ships:

**1. Production-grade streaming.** Skeleton streaming yields chunks but does not handle reconnect, partial usage reporting (end-of-stream usage deltas), or abort on timeout. Required by the quickstart LangChain example (#021) and by any user-facing `run --stream`.

**2. Bounded retries.** Network errors and 500-class responses must be retried with exponential backoff and jitter. No retries on 4xx non-429 (client errors). Max retries bounded by config; failures surface as `LLMResponse` with `finish_reason="error"`.

**3. Rate-limit (429) handling.** Sleep-then-retry honouring `Retry-After` header if present, else exponential backoff with jitter. No circuit breaker (that is AgentLoomBackend's job) — `DirectBackend` stays minimal on resilience.

Also: **pricing tables updated** to include all models each provider publishes as of 2026-04, including reasoning-token-capable models.

### Proposal

**1. Retry policy module:**

```python
# src/agentanvil/backends/retry.py
from dataclasses import dataclass


@dataclass
class RetryPolicy:
    max_attempts: int = 3
    base_delay_s: float = 1.0
    max_delay_s: float = 30.0
    retryable_status_codes: set[int] = frozenset({408, 429, 500, 502, 503, 504})


async def with_retry(fn, *, policy: RetryPolicy, logger=None):
    for attempt in range(1, policy.max_attempts + 1):
        try:
            return await fn()
        except httpx.HTTPStatusError as e:
            if e.response.status_code not in policy.retryable_status_codes:
                raise
            if attempt == policy.max_attempts:
                raise
            delay = _compute_delay(e.response, attempt, policy)
            await anyio.sleep(delay)
```

**2. Streaming with usage delta:**

Providers report usage differently:
- OpenAI: includes `usage` in final chunk when `stream_options.include_usage=true`.
- Anthropic: `message_delta` with `usage` in `message_stop` event.
- Google: `usageMetadata` on final `response` payload.

All three normalised into a single `LLMResponse` at stream close.

**3. Updated pricing tables** (as of 2026-04):

- OpenAI: gpt-4o (all snapshots), gpt-4o-mini, o1, o1-mini, o3 series when available.
- Anthropic: claude-3-7-sonnet, claude-3-5-haiku, claude-3-opus, extended thinking pricing.
- Google: gemini-2.0-pro, gemini-2.0-flash, gemini-1.5-pro, reasoning-token pricing if published.

Each table versioned; CI job fails if pricing table age > 90 days.

### Scope

- `src/agentanvil/backends/direct.py` — add streaming refinements, retry wiring.
- `src/agentanvil/backends/retry.py` — new.
- `src/agentanvil/backends/_providers/openai.py` — streaming final-usage handling.
- `src/agentanvil/backends/_providers/anthropic.py` — message_delta streaming.
- `src/agentanvil/backends/_providers/google.py` — usageMetadata streaming.
- `src/agentanvil/backends/pricing/*.json` — updated.
- `scripts/check_pricing_freshness.py` — new.
- `tests/backends/test_direct_retry.py`
- `tests/backends/test_direct_streaming.py`

### Regression tests

- `test_retry_bounded_by_max_attempts`
- `test_retry_honours_retry_after_header`
- `test_retry_stops_on_4xx_non_429`
- `test_retry_exponential_backoff_with_jitter`
- `test_streaming_openai_usage_in_final_chunk`
- `test_streaming_anthropic_usage_in_message_stop`
- `test_streaming_google_usage_in_final_response`
- `test_streaming_timeout_cancels_gracefully`
- `test_pricing_table_freshness_ci_job`

### Notes

- **No circuit breaker** in `DirectBackend`. Users who need CB install `agentanvil[agentloom]` and use `AgentLoomBackend` (#016).
- **No rate limiter across workflows.** `DirectBackend` is single-process; distributed rate limiting is AgentLoom #69 / #131.
- Depends on: #003 (DirectBackend skeleton).
- Blocks: #021 (quickstart), #024 (first case study).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

flesh out DirectBackend with streaming, bounded retries, rate-limit handling, pricing tables #8

Description

Proposal

Scope

Regression tests

Notes

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

flesh out DirectBackend with streaming, bounded retries, rate-limit handling, pricing tables #8

Description

Description

Proposal

Scope

Regression tests

Notes

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions