You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
DirectBackend skeleton from 0.1.x #3 covers the happy path for OpenAI / Anthropic / Google. Three gaps remain before 0.2.0 ships:
1. Production-grade streaming. Skeleton streaming yields chunks but does not handle reconnect, partial usage reporting (end-of-stream usage deltas), or abort on timeout. Required by the quickstart LangChain example (#021) and by any user-facing run --stream.
2. Bounded retries. Network errors and 500-class responses must be retried with exponential backoff and jitter. No retries on 4xx non-429 (client errors). Max retries bounded by config; failures surface as LLMResponse with finish_reason="error".
3. Rate-limit (429) handling. Sleep-then-retry honouring Retry-After header if present, else exponential backoff with jitter. No circuit breaker (that is AgentLoomBackend's job) — DirectBackend stays minimal on resilience.
Also: pricing tables updated to include all models each provider publishes as of 2026-04, including reasoning-token-capable models.
Description
DirectBackendskeleton from 0.1.x #3 covers the happy path for OpenAI / Anthropic / Google. Three gaps remain before 0.2.0 ships:1. Production-grade streaming. Skeleton streaming yields chunks but does not handle reconnect, partial usage reporting (end-of-stream usage deltas), or abort on timeout. Required by the quickstart LangChain example (#021) and by any user-facing
run --stream.2. Bounded retries. Network errors and 500-class responses must be retried with exponential backoff and jitter. No retries on 4xx non-429 (client errors). Max retries bounded by config; failures surface as
LLMResponsewithfinish_reason="error".3. Rate-limit (429) handling. Sleep-then-retry honouring
Retry-Afterheader if present, else exponential backoff with jitter. No circuit breaker (that is AgentLoomBackend's job) —DirectBackendstays minimal on resilience.Also: pricing tables updated to include all models each provider publishes as of 2026-04, including reasoning-token-capable models.
Proposal
1. Retry policy module:
2. Streaming with usage delta:
Providers report usage differently:
usagein final chunk whenstream_options.include_usage=true.message_deltawithusageinmessage_stopevent.usageMetadataon finalresponsepayload.All three normalised into a single
LLMResponseat stream close.3. Updated pricing tables (as of 2026-04):
Each table versioned; CI job fails if pricing table age > 90 days.
Scope
src/agentanvil/backends/direct.py— add streaming refinements, retry wiring.src/agentanvil/backends/retry.py— new.src/agentanvil/backends/_providers/openai.py— streaming final-usage handling.src/agentanvil/backends/_providers/anthropic.py— message_delta streaming.src/agentanvil/backends/_providers/google.py— usageMetadata streaming.src/agentanvil/backends/pricing/*.json— updated.scripts/check_pricing_freshness.py— new.tests/backends/test_direct_retry.pytests/backends/test_direct_streaming.pyRegression tests
test_retry_bounded_by_max_attemptstest_retry_honours_retry_after_headertest_retry_stops_on_4xx_non_429test_retry_exponential_backoff_with_jittertest_streaming_openai_usage_in_final_chunktest_streaming_anthropic_usage_in_message_stoptest_streaming_google_usage_in_final_responsetest_streaming_timeout_cancels_gracefullytest_pricing_table_freshness_ci_jobNotes
DirectBackend. Users who need CB installagentanvil[agentloom]and useAgentLoomBackend(scaffold mkdocs-material docs site #16).DirectBackendis single-process; distributed rate limiting is AgentLoom #69 / #131.