feat: add retry and circuit breaker utilities for LLM calls (mitigates #172)#216
feat: add retry and circuit breaker utilities for LLM calls (mitigates #172)#216Jah-yee wants to merge 2 commits intoHKUDS:mainfrom
Conversation
Made-with: Cursor
|
Thanks for the resilience utilities work. I did a Codex-assisted review and found two blocking behavior issues to fix before merge:
|
…rams Made-with: Cursor
|
Thanks a lot for the detailed Codex-assisted review on the resilience utilities — both issues you highlighted were very helpful. Half-open single-flight behaviour
Retry parameter validation
If you would like different defaults or error types for misconfiguration, I am happy to tweak them to match your preferred style. |
|
Thanks a lot for the detailed Codex-assisted review on the resilience utilities — both issues were very helpful.
If you prefer different defaults or error types for misconfiguration, I’m happy to tweak them. |
Summary
This PR adds a small
raganything.resiliencemodule with reusable retry and circuit breaker helpers for LLM API calls, so long-running document processing can better tolerate transient network issues.Motivation
As discussed in #172,
process_document_completeand similar flows can get stuck when LLM calls intermittently fail: there is no retry/backoff strategy and no circuit breaker to prevent cascading failures. A focused resilience layer makes it easier to harden these call sites without pulling in extra dependencies.Changes
raganything/resilience.py@retrydecorator for synchronous functions: exponential backoff with optional jitter, detection of common transient exceptions (httpx, OpenAI clients, generic network errors), configurable attempts/delays, optionalon_retrycallback.@async_retrydecorator with the same semantics for async functions.CircuitBreakerclass: tracks consecutive failures, opens at a configurable threshold, uses half-open trials to recover automatically when the upstream stabilizes. In-memory and dependency-free.tests/test_resilience.pyon_retrycallback invocation, circuit breaker state transitions (closed → open → half-open → closed), and error handling.Testing
pytestlocally includingtests/test_resilience.py; all tests passed.Thanks for your work on RAG-Anything—if you’d like different defaults or naming for these helpers, I’m happy to revise the PR to match your preferences.