feat(conductor): queue messages when conductor is busy, deliver with reply callback#452
Open
nlenepveu wants to merge 8 commits intoasheshgoplani:mainfrom
Open
feat(conductor): queue messages when conductor is busy, deliver with reply callback#452nlenepveu wants to merge 8 commits intoasheshgoplani:mainfrom
nlenepveu wants to merge 8 commits intoasheshgoplani:mainfrom
Conversation
When the conductor Claude session is busy (running/active), the bridge was blocking on 'agent-deck session send' for 80s then dropping the message with a failure notification. Changes: - send_to_conductor() now checks session status before sending. If busy, messages are queued in-memory and delivered by a background drain task that polls every 5s until the conductor is idle/waiting again. - Race condition handled: if the conductor becomes busy between the status check and the actual send, the timeout/not-ready error triggers queueing instead of dropping. - Telegram/Slack handlers show '⏳ Conductor busy — message queued' to the user instead of a failure message. - Heartbeats skip the cycle when conductor is busy (they're periodic, no point queueing them). Also syncs bridge.py with the installed version (multi-conductor discovery, Slack support, heartbeat rules injection).
- Fix crash: config["user_id"] → config["telegram"]["user_id"] - Fix Slack: unpack tuple from send_to_conductor (was always truthy bool check) - Fix Slack: remove wait_for_response() call (function no longer exists) - Add reply_callback to queue infrastructure so users get deferred responses - Thread _tg_reply / _slack_reply closures through was_busy paths - Fix _drain_queue: run_in_executor for blocking subprocess calls (unfreeze loop) - Fix _drain_queue: notify all queued callbacks on error drop, not just first - Fix _drain_queue: guard against empty stdout to avoid aiogram crash - Fix _drain_queue: use --wait to capture conductor response for callback - Fix ensure_future → create_task (deprecated API) - Remove dead _queue_lock (never acquired) - Promote get_default_conductor to module level (was inaccessible across scopes) - Move typing indicators before blocking send_to_conductor calls (Telegram + Slack) - Fix _tg_reply: always include "Queued response:" label regardless of profile_tag - Remove unnecessary say_ref/thread_ts_ref aliases in _slack_reply closure - Add functools import for run_in_executor partial calls
Event loop / blocking (M1, M2, Q9): - make ensure_conductor_running async; time.sleep(5) → await asyncio.sleep(5) - wrap get_session_status + send_to_conductor(wait_for_reply=True) in run_in_executor in Telegram and Slack handlers (up to 330s block eliminated) - get_event_loop() → get_running_loop() throughout drain queue Subprocess (M8, Q6): - run_cli: kill + communicate orphaned subprocess on TimeoutExpired - get_session_status: return "unknown" on CLI failure (not "error") to prevent premature queue drop on transient subprocess issues; drain skips "unknown" Queue type + bounds (M3, Q1, Q4): - add ReplyCallback = Callable[[str], Coroutine[Any, Any, None]] type alias - switch _message_queue values from list to collections.deque (O(1) popleft) - enforce MAX_QUEUE_DEPTH=20 per session; drop oldest on overflow Queue infrastructure (M4, M5, M6, M7, M10, M11, Q2, Q5): - add force_queue param to send_to_conductor: skip redundant inner status check and enqueue immediately (eliminates was_busy TOCTOU + redundant subprocess call) - add _drain_queue_supervised wrapper: restarts drain on unexpected crash - _ensure_drain_task: retrieve + log exception before restarting; use get_running_loop().create_task instead of bare create_task - add _fire_callback helper: invoke callbacks via create_task (fire-and-forget) so drain loop is not blocked by slow Telegram/Slack API calls - move drain exit check to after the session loop (Q2: avoids missing items) - log delivered count after pop with actual remaining (M10: was pre-pop estimate) - differentiate error messages: error-state drop vs send-failure drop (M11) UX (M9, Q7): - Slack _slack_reply: add "Queued response (waited Xs):" prefix matching Telegram - both _tg_reply and _slack_reply: show elapsed wait time via time.monotonic() closure
ReplyCallback was defined after send_to_conductor used it in a function signature, causing NameError at import time. Move it to just before the first use.
…files() The load_config() refactor removed 'profiles' from the config dict in favour of discover_conductors()/get_unique_profiles(), but create_telegram_bot still referenced the old key, causing KeyError on startup.
Two fixes: 1. run_cli: replace subprocess.run() with Popen + communicate(timeout=) so the proc object is available on TimeoutExpired — subprocess.run() does not set exc.proc. Also adds start_new_session=True + os.killpg() to kill the entire process group (including tmux send-keys grandchildren) on timeout. 2. heartbeat_loop: wrap get_session_status and send_to_conductor(wait_for_reply=True) in run_in_executor — both are blocking subprocess calls that were freezing the event loop for up to RESPONSE_TIMEOUT (300s), preventing Telegram/Slack message processing during heartbeat sends.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
running/active/starting, messages are held in an in-memory queue (_message_queue) and delivered once the conductor becomes freeReplyCallback) is stored with each queued item and invoked via fire-and-forget task after drain delivery — the deferred response is pushed back to the user in the same Telegram chat / Slack threadsubprocess.runcalls in async handlers are wrapped inrun_in_executor;ensure_conductor_runningis nowasyncwithawait asyncio.sleepinstead oftime.sleepMAX_QUEUE_DEPTH=20per session,deque/popleft()(O(1)), drain task supervisor with crash recovery,get_running_loop()throughout,force_queueparam to skip redundant status checks,get_session_statusreturns"unknown"on CLI failure (not"error") to prevent premature queue drops"Queued response (waited Xs):"prefix with elapsed wait time; orphaned subprocesses killed onTimeoutExpired; differentiated error messages for error-state drop vs send failureCommits
09217e6— core queue infrastructure:_message_queue,_enqueue_message,_drain_queue,_ensure_drain_task;was_busydetection in Telegram and Slack handlers8a0172b— post-rebase bug fixes: Slack tuple check,wait_for_responseNameError,reply_callbackthreading,run_in_executorfor drain,create_taskover deprecatedensure_future,get_default_conductorpromotion, typing indicators before blocking sends69d3ae1— robustness pass: all MUST FIX + QUICK WIN items from 4-reviewer audit (async handlers, drain supervisor,deque,MAX_QUEUE_DEPTH,force_queue,get_running_loop, orphan subprocess kill, "Queued response" prefix on Slack, elapsed time in deferred replies)Test plan
"⏳ Conductor busy — message queued, will reply here when done."; when conductor finishes, expect follow-up"Queued response (waited Xs):\n<response>""[Queued message could not be delivered — conductor is in error state.]"per queued itempython3 -c "import ast; ast.parse(open('conductor/bridge.py').read()); print('OK')"