-
Notifications
You must be signed in to change notification settings - Fork 0
Add fallback handling for openai LLM failures (nonresponsive scenarios) #4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add fallback handling for openai LLM failures (nonresponsive scenarios) #4
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements a global fallback system to maintain operational continuity when the OpenAI Assistant API experiences outages or timeouts in Listener Mode.
Key changes:
- Added dynamic model configuration fetched from Firestore with fallback to
gpt-4o-mini - Implemented a two-tier fallback mechanism: primary model attempt, then retry with a fallback model, and finally predefined conversational responses
- Enhanced logging for LLM configuration, run status, and error debugging
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| whatsapp_bot/app/handlers/ListenerMode.py | Implements dynamic model fetching from Firestore, two-tier fallback logic with model retry, and random fallback messages when both models fail |
| tools/2ndRoundDeliberation/initialize_listener_event.py | Adds default_model field to event initialization with default value gpt-4o-mini |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
juggler434
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some small nits, but also I'm going to start bouncing PRs that don't have unit test coverage. You're digging a whole for yourself by not writing any unit tests. If you want to pair on writing unit tests lets something up.
justinstimatze
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few minor questions.
justinstimatze
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think my nitpicks have been addressed but agree with Bill on tests so will defer to his review on that.
What is the goal of this PR?
Key Changes
Listener Mode:
• Added dynamic fallback logic that detects server_error or timeouts.
• Returns short predefined responses (e.g., “Agreed.”, “Please continue sharing.”, “That’s an interesting point, tell me more.”).
• Fetches default_model dynamically from Firestore (info document) with fallback to gpt-4o-mini.
• Automatically retries once with gpt-4.1-mini if the primary model fails.
Follow-Up Mode (Planned next):
• To be extended with similar fallback handling + user-facing notice message.
• Will include event logging for error tracking.
Infrastructure:
• Structured logs for LLM Config, LLM Run, and Fallback flow to improve debugging.
• Keeps user interaction flow uninterrupted even when LLMs are unavailable.
Testing
Tested locally for listener mode with multiple events including 2ndround deliberation, verified that Listener Mode returns fallback messages when primary/fallback models fail/Confirmed Firestore dynamic model configuration fetch works as expected, logs successfully record fallback sequence for audit review.