[CLEAN] Synthetic Benchmark PR #11326 - hotfix(backend): fix rate-limited messages blocking queue by republishing to back #1

ofir-frd · 2025-12-15T09:47:33Z

Benchmark PR Significant-Gravitas#11326

Type: Clean (correct implementation)

Original PR Title: hotfix(backend): fix rate-limited messages blocking queue by republishing to back
Original PR Description: ## Summary
Fix critical queue blocking issue where rate-limited user messages prevent other users' executions from being processed, causing the 135 late executions reported in production.

Root Cause Analysis

When a user exceeds max_concurrent_graph_executions_per_user (25), the executor uses basic_nack(requeue=True) which sends the message to the FRONT of the RabbitMQ queue. This creates an infinite blocking loop where:

Rate-limited message goes to front of queue
Gets processed, hits rate limit again
Goes back to front of queue
Blocks all other users' messages indefinitely

Solution Implementation

🔧 Core Changes

New setting: requeue_by_republishing (default: True) in backend/util/settings.py
Smart _ack_message: Automatically uses republishing when requeue=True and setting enabled
Efficient implementation: Uses existing self.run_client connection instead of creating new ones
Integration test: Real RabbitMQ test validates queue ordering behavior

🔄 Technical Implementation

Before (blocking):

basic_nack(delivery_tag, requeue=True)  # Goes to FRONT of queue ❌

After (non-blocking):

if requeue and self.config.requeue_by_republishing:
    # First: Republish to BACK of queue
    self.run_client.publish_message(...)
    # Then: Reject without requeue
    basic_nack(delivery_tag, requeue=False)

📊 Impact

✅ Other users' executions no longer blocked by rate-limited users
✅ Fair queue processing - FIFO behavior maintained for all users
✅ Rate limiting still works - just doesn't block others
✅ Configurable - can revert to old behavior with requeue_by_republishing=False
✅ Zero performance impact - uses existing connections

Test Plan

Integration test: test_requeue_integration.py validates real RabbitMQ queue ordering
Scenario testing: Confirms rate-limited messages go to back of queue
Cross-user validation: Verifies other users' messages process correctly
Setting test: Confirms configuration loads with correct defaults

Deployment Strategy

This is a hotfix that can be deployed immediately:

Backward compatible: Old behavior available via config
Safe default: New behavior is safer than current state
No breaking changes: All existing functionality preserved
Immediate relief: Resolves production queue blocking

Files Modified

backend/executor/manager.py: Enhanced _ack_message logic and _requeue_message_to_back method
backend/util/settings.py: Added requeue_by_republishing configuration field
test_requeue_integration.py: Integration test for queue ordering validation

Related Issues

Fixes the 135 late executions issue where messages were stuck in QUEUED state despite available executor capacity (583m/600m utilization).

🤖 Generated with Claude Code

Co-Authored-By: Claude noreply@anthropic.com
Original PR URL: Significant-Gravitas#11326

Apply changes for benchmark PR

7c32aaf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CLEAN] Synthetic Benchmark PR #11326 - hotfix(backend): fix rate-limited messages blocking queue by republishing to back #1

[CLEAN] Synthetic Benchmark PR #11326 - hotfix(backend): fix rate-limited messages blocking queue by republishing to back #1

Uh oh!

ofir-frd commented Dec 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[CLEAN] Synthetic Benchmark PR #11326 - hotfix(backend): fix rate-limited messages blocking queue by republishing to back #1

Are you sure you want to change the base?

[CLEAN] Synthetic Benchmark PR #11326 - hotfix(backend): fix rate-limited messages blocking queue by republishing to back #1

Uh oh!

Conversation

ofir-frd commented Dec 15, 2025

Benchmark PR Significant-Gravitas#11326

Root Cause Analysis

Solution Implementation

🔧 Core Changes

🔄 Technical Implementation

📊 Impact

Test Plan

Deployment Strategy

Files Modified

Related Issues

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants