forked from Significant-Gravitas/AutoGPT
-
Notifications
You must be signed in to change notification settings - Fork 5
[CORRUPTED] Synthetic Benchmark PR #11326 - hotfix(backend): fix rate-limited messages blocking queue by republishing to back #8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
tomerqodo
wants to merge
6
commits into
base_pr_11326_20251231_6633
Choose a base branch
from
corrupted_pr_11326_20251231_6633
base: base_pr_11326_20251231_6633
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+390
−4
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…hing to back ## Summary Fix critical queue blocking issue where rate-limited user messages prevent other users' executions from being processed. ## Root Cause When user exceeds max_concurrent_graph_executions_per_user (25), RabbitMQ's basic_nack(requeue=True) sends the message to the FRONT of the queue, creating an infinite blocking loop that prevents other users' messages from being processed. ## Solution Add configurable requeue_by_republishing behavior that sends rate-limited messages to the BACK of the queue instead of front: ### Key Changes - **New setting**: requeue_by_republishing (default: True) in backend/util/settings.py - **Smart _ack_message**: Automatically uses republishing when requeue=True and setting enabled - **Efficient implementation**: Uses existing self.run_client connection instead of creating new ones - **Integration test**: Real RabbitMQ test validates queue ordering behavior ### Implementation Details - Rate-limited messages: publish_message() then basic_nack(requeue=False) - Pool-full messages: Same treatment for fair distribution - Backward compatible: Can disable with requeue_by_republishing=False - Clean code: Single logic path in _ack_message method ## Impact - ✅ Other users' executions no longer blocked by rate-limited users - ✅ Fair queue processing - FIFO behavior maintained - ✅ Rate limiting still works - just doesn't block others - ✅ Configurable - can revert to old behavior if needed ## Testing - Integration test validates real RabbitMQ queue ordering - Tests confirm rate-limited messages go to back of queue - Verifies other users' messages process correctly Fixes the 135 late executions issue reported in production. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
## Summary Fix the remaining issues with the rate-limited message requeue implementation: ### Type Error Fix - Change self.config.requeue_by_republishing to settings.config.requeue_by_republishing in manager.py:1493 - ExecutionManager accesses settings through global settings object, not self.config ### Integration Test Improvements - Use dedicated test queue name (test_requeue_ordering) instead of production queue to avoid conflicts - Create separate test exchange and routing key for isolation - Add proper cleanup with queue/exchange deletion in finally block - Remove pytest import (unused) and unnecessary marker configuration ## Testing Results - ✅ poetry run pyright: 0 errors, 0 warnings - ✅ poetry run pytest test_requeue_integration.py: All tests pass - ✅ poetry run format: Clean formatting ## Validation Integration test confirms: - FIFO queue ordering works correctly - Rate-limited messages go to back of queue (not front) - Other users' executions are NOT blocked by rate-limited users - Republishing method behaves exactly like expected Fixes the 135 late executions issue in production. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
…handling ## Summary Simplify the requeue implementation by inlining the _requeue_message_to_back method directly into the _ack_message callback. ## Changes Made - Remove separate _requeue_message_to_back method - Inline the publish logic directly in _republish_to_back callback - Include nack operation inside the try-catch block for better error handling - Add fallback to traditional requeue if republishing fails ## Benefits - Cleaner code with fewer method calls - Better error handling: nack is now protected by try-catch - Fallback mechanism: if republishing fails, falls back to traditional requeue - Same functionality with simpler implementation ## Testing - ✅ poetry run pyright: 0 errors, 0 warnings - ✅ poetry run pytest test_requeue_integration.py: All tests pass - ✅ Integration test confirms rate limiting fix still works correctly 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
## Summary Fix the message consumption logic in the integration test to reliably receive and process messages from RabbitMQ. ## Changes Made - Improved consume_messages method with better queue checking and synchronous consumption - Added proper error handling and consumer cancellation - Removed debug prints for cleaner test output - Use basic_get to check for messages before setting up consumer - Fixed bare except clause for proper exception handling ## Testing Results - ✅ All 3 test scenarios pass consistently: 1. Normal FIFO queue behavior: A → B → C 2. Rate limiting fix: user2 executions NOT blocked by user1 3. Republishing sends messages to back of queue ## Performance - Test completes in ~2.3 seconds consistently - No more timeout or message consumption failures - Proper cleanup of test resources The integration test now reliably validates that the rate limiting queue blocking fix works correctly with real RabbitMQ. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
…e test file ## Summary Consolidate all requeue behavior validation into a single test file with two test functions, eliminating the need for separate test files. ## Test Coverage ### test_queue_ordering_behavior() - ✅ Normal FIFO queue behavior validation - ✅ Rate limiting fix: user2 executions NOT blocked by user1 - ✅ Republishing sends messages to BACK of queue (our fix) ### test_traditional_requeue_behavior() - ✅ **HYPOTHESIS CONFIRMED**: Traditional requeue sends messages to FRONT - ✅ Validates root cause of queue blocking issue - ✅ Proves why rate-limited messages block other users ## Key Validation Results **Our Fix (Republishing):** - Messages go to BACK of queue → No blocking ✅ **Original Problem (Traditional Requeue):** - Messages go to FRONT of queue → Causes blocking ✅ - Order: A (requeued to front) → B - Explains the 135 late executions issue ## Commands `poetry run pytest test_requeue_integration.py -s` - Run both tests - test_queue_ordering_behavior: Tests our fix - test_traditional_requeue_behavior: Validates hypothesis Both tests use real RabbitMQ (no mocking) for authentic behavior validation. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
This was referenced Dec 31, 2025
This was referenced Jan 6, 2026
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Benchmark PR Significant-Gravitas#11326
Type: Corrupted (contains bugs)
Original PR Title: hotfix(backend): fix rate-limited messages blocking queue by republishing to back
Original PR Description: ## Summary
Fix critical queue blocking issue where rate-limited user messages prevent other users' executions from being processed, causing the 135 late executions reported in production.
Root Cause Analysis
When a user exceeds
max_concurrent_graph_executions_per_user(25), the executor usesbasic_nack(requeue=True)which sends the message to the FRONT of the RabbitMQ queue. This creates an infinite blocking loop where:Solution Implementation
🔧 Core Changes
requeue_by_republishing(default:True) inbackend/util/settings.py_ack_message: Automatically uses republishing whenrequeue=Trueand setting enabledself.run_clientconnection instead of creating new ones🔄 Technical Implementation
Before (blocking):
After (non-blocking):
📊 Impact
requeue_by_republishing=FalseTest Plan
test_requeue_integration.pyvalidates real RabbitMQ queue orderingDeployment Strategy
This is a hotfix that can be deployed immediately:
Files Modified
backend/executor/manager.py: Enhanced_ack_messagelogic and_requeue_message_to_backmethodbackend/util/settings.py: Addedrequeue_by_republishingconfiguration fieldtest_requeue_integration.py: Integration test for queue ordering validationRelated Issues
Fixes the 135 late executions issue where messages were stuck in QUEUED state despite available executor capacity (583m/600m utilization).
🤖 Generated with Claude Code
Co-Authored-By: Claude noreply@anthropic.com
Original PR URL: Significant-Gravitas#11326