Skip to content

flesh out DirectBackend with streaming, bounded retries, rate-limit handling, pricing tables #8

@cchinchilla-dev

Description

@cchinchilla-dev

Description

DirectBackend skeleton from 0.1.x #3 covers the happy path for OpenAI / Anthropic / Google. Three gaps remain before 0.2.0 ships:

1. Production-grade streaming. Skeleton streaming yields chunks but does not handle reconnect, partial usage reporting (end-of-stream usage deltas), or abort on timeout. Required by the quickstart LangChain example (#021) and by any user-facing run --stream.

2. Bounded retries. Network errors and 500-class responses must be retried with exponential backoff and jitter. No retries on 4xx non-429 (client errors). Max retries bounded by config; failures surface as LLMResponse with finish_reason="error".

3. Rate-limit (429) handling. Sleep-then-retry honouring Retry-After header if present, else exponential backoff with jitter. No circuit breaker (that is AgentLoomBackend's job) — DirectBackend stays minimal on resilience.

Also: pricing tables updated to include all models each provider publishes as of 2026-04, including reasoning-token-capable models.

Proposal

1. Retry policy module:

# src/agentanvil/backends/retry.py
from dataclasses import dataclass


@dataclass
class RetryPolicy:
    max_attempts: int = 3
    base_delay_s: float = 1.0
    max_delay_s: float = 30.0
    retryable_status_codes: set[int] = frozenset({408, 429, 500, 502, 503, 504})


async def with_retry(fn, *, policy: RetryPolicy, logger=None):
    for attempt in range(1, policy.max_attempts + 1):
        try:
            return await fn()
        except httpx.HTTPStatusError as e:
            if e.response.status_code not in policy.retryable_status_codes:
                raise
            if attempt == policy.max_attempts:
                raise
            delay = _compute_delay(e.response, attempt, policy)
            await anyio.sleep(delay)

2. Streaming with usage delta:

Providers report usage differently:

  • OpenAI: includes usage in final chunk when stream_options.include_usage=true.
  • Anthropic: message_delta with usage in message_stop event.
  • Google: usageMetadata on final response payload.

All three normalised into a single LLMResponse at stream close.

3. Updated pricing tables (as of 2026-04):

  • OpenAI: gpt-4o (all snapshots), gpt-4o-mini, o1, o1-mini, o3 series when available.
  • Anthropic: claude-3-7-sonnet, claude-3-5-haiku, claude-3-opus, extended thinking pricing.
  • Google: gemini-2.0-pro, gemini-2.0-flash, gemini-1.5-pro, reasoning-token pricing if published.

Each table versioned; CI job fails if pricing table age > 90 days.

Scope

  • src/agentanvil/backends/direct.py — add streaming refinements, retry wiring.
  • src/agentanvil/backends/retry.py — new.
  • src/agentanvil/backends/_providers/openai.py — streaming final-usage handling.
  • src/agentanvil/backends/_providers/anthropic.py — message_delta streaming.
  • src/agentanvil/backends/_providers/google.py — usageMetadata streaming.
  • src/agentanvil/backends/pricing/*.json — updated.
  • scripts/check_pricing_freshness.py — new.
  • tests/backends/test_direct_retry.py
  • tests/backends/test_direct_streaming.py

Regression tests

  • test_retry_bounded_by_max_attempts
  • test_retry_honours_retry_after_header
  • test_retry_stops_on_4xx_non_429
  • test_retry_exponential_backoff_with_jitter
  • test_streaming_openai_usage_in_final_chunk
  • test_streaming_anthropic_usage_in_message_stop
  • test_streaming_google_usage_in_final_response
  • test_streaming_timeout_cancels_gracefully
  • test_pricing_table_freshness_ci_job

Notes

  • No circuit breaker in DirectBackend. Users who need CB install agentanvil[agentloom] and use AgentLoomBackend (scaffold mkdocs-material docs site #16).
  • No rate limiter across workflows. DirectBackend is single-process; distributed rate limiting is AgentLoom #69 / #131.
  • Depends on: bump version to 0.1.1 #3 (DirectBackend skeleton).
  • Blocks: #021 (quickstart), #024 (first case study).

Metadata

Metadata

Assignees

No one assigned

    Labels

    backendsLLM backend implementations (Direct, AgentLoom, Mock)enhancementNew feature or requestprovidersVendor providers (OpenAI, Anthropic, Google)

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions