Skip to content

[Epic] Inter-Agent Task Handoff #164

@raykao

Description

@raykao

Overview

Inter-Agent Task Handoff adds an async task delegation system to copilot-bridge that allows Agent A to fire-and-forget a long-running task to Agent B and receive results via callback, without blocking Agent A's session or requiring a human intermediary. The current ask_agent tool is synchronous and blocks for up to 300 seconds, making it unsuitable for long-running tasks, parallel multi-agent workflows, or autonomous pipeline chains. This feature adds two new tools -- delegate_task and check_task -- backed by a new delegated_tasks SQLite table, an asynchronous execution loop, and a dual-path result delivery system (channel callback + task store polling).

Spec Reference

https://github.com/raykao/dark-factory/tree/speckit/inter-agent-task-handoff/specs/inter-agent-task-handoff


Phases and Tasks

Phase 0 - Pre-Phase Spike

  • [T000] Spike: trace withWorkspaceEnv() mutex scope to determine effective concurrency ceiling and set maxConcurrentDelegations default accordingly

Phase 1 - Foundation + delegate_task Skeleton

  • [T001] Define all TypeScript interfaces and union types in src/types/task-delegator.ts (TaskStatus, DelegatedTask, NewDelegatedTask, DelegateTaskInput/Output/Error, CheckTaskInput/Output, DelegationAllowEntry)
  • [T002] Add delegated_tasks DDL migration with CREATE TABLE IF NOT EXISTS guard and 5 named indexes in src/state/store.ts
  • [T003] Add 7 new interAgent delegation config fields with startup-time validation in src/config/config.ts (maxConcurrentDelegations, maxDelegationTimeout, maxDelegationTimeoutCeiling, maxTasksPerChain, deduplicationWindowSecs, requeueOnRestart, delegationAllow)
  • [T004] Implement 8 store functions in src/state/store.ts: insertDelegatedTask, updateTaskStatus, getTask, listTasks, countRunningTasks, countChainTasks (queries both delegated_tasks AND agent_calls), findDuplicateTask, getTasksByStatus
  • [T005] Extend src/core/inter-agent.ts with delegationDepth on InterAgentContext and add canDelegateTo / canReceiveDelegationFrom exported functions
  • [T006] Implement TaskDelegator class in src/core/task-delegator.ts with 7-step synchronous validation and handleDelegateTask() returning { taskId, status: "queued" } within 2s (no async execution yet)
  • [T007] Register delegate_task tool in src/core/session-manager.ts (add to BRIDGE_CUSTOM_TOOLS, wire handler to TaskDelegator)
  • [T008] Write Phase 1 unit tests in test/unit/task-delegator.test.ts covering all 7 rejection error codes, deduplication, countChainTasks dual-table query, and zero regressions in existing ask_agent tests

Phase 2 - Async Executor + check_task

  • [T009] Implement start(), tick(), and executeTask() in src/core/task-delegator.ts -- 2s poll loop, priority-ordered dispatch, detached async execution via executeEphemeralCall(), exponential backoff retry
  • [T010] Implement recoverStaleTasks() in src/core/task-delegator.ts -- on startup, transitions running tasks to timed_out or re-queues per requeueOnRestart config, completes within 5s
  • [T011] Wire TaskDelegator in bridge startup: call recoverStaleTasks() before accepting connections, call start() after full initialization
  • [T012] Implement handleCheckTask() in src/core/task-delegator.ts -- single-task mode (full untruncated result) and list mode (up to 10, result excluded), access-scoped to calling bot (admin sees all)
  • [T013] Register check_task tool in src/core/session-manager.ts (add to BRIDGE_CUSTOM_TOOLS, wire handler)
  • [T014] Write Phase 2 unit + integration tests: tick() lifecycle, full/failure result paths, retry + exhaustion, recoverStaleTasks() timing, check_task all lifecycle states, admin visibility, end-to-end poll-to-completion, no regression on ask_agent/schedule

Phase 3 - Callback Delivery + Safety Rails

  • [T015] Implement formatCallbackMessage() in src/core/task-delegator.ts -- completion template (taskId[0:8], targetBot, duration, 500-char result summary, check_task hint) and failure template (status, attempt/max, retry countdown)
  • [T016] Implement deliverCallback() in src/core/task-delegator.ts and wire into executeTask() success/error paths -- resolves callbackChannel ?? callerChannel, posts via existing channel adapter, silent failure with WARN log
  • [T017] Integrate delegate_task calls into the existing loop detector in src/core/loop-detector.ts -- reuse existing tool-name + args-hash pattern; reject at CRITICAL threshold with CONCURRENCY_LIMIT_EXCEEDED
  • [T018] Add INFO-level lifecycle transition logs and agent_calls audit rows on task completion in src/core/task-delegator.ts
  • [T019] Write Phase 3 integration tests covering all 12 named acceptance criteria (AC-001 through AC-012) plus manual smoke test (3 parallel delegations, callbacks, check_task full result)

Polish

  • [T020] Add README / docs/ documentation for delegate_task and check_task: parameter tables, output schema, all 7 error codes, fire-and-forget + poll-loop code examples, safety limits table, T000 mutex finding note
  • [T021] Verify TypeScript strict-mode compilation across all new and modified files; fix implicit any, missing return types, unchecked null dereferences
  • [T022] Run final backward-compatibility regression: full ask_agent and schedule integration test suites against the Phase 1-3 build; confirm no table schema or tool resolution regressions

Acceptance Criteria

  • AC-001 - Fire-and-Forget Semantics: delegate_task returns within 2s with a valid UUID taskId and status: "queued"; Agent A's session remains interactive during the full execution window.
  • AC-002 - Task Execution and Completion: on executeEphemeralCall() success, status = "completed", result is full untruncated text, completed_at is set, and a completion notification is posted to the callback channel.
  • AC-003 - Status Polling: check_task({ taskId }) returns id, status, callerBot, targetBot, createdAt, and (if completed) the full result.
  • AC-004 - Concurrency Limit: when maxConcurrentDelegations = 2 and 2 tasks are running, a third call returns CONCURRENCY_LIMIT_EXCEEDED and creates no DB row.
  • AC-005 - Chain Budget: when maxTasksPerChain = 5 and a chain_id has 5 tasks (across ask_agent + delegate_task), a 6th call returns CHAIN_BUDGET_EXCEEDED.
  • AC-006 - Timeout Capping: timeout: 7200 with maxDelegationTimeoutCeiling: 3600 results in timeout_ms = 3600000.
  • AC-007 - Allowlist Enforcement: unauthorized caller returns DELEGATION_NOT_PERMITTED without creating a DB row.
  • AC-008 - Deduplication: identical task resubmitted within the deduplication window returns the existing taskId and current status; no new row created.
  • AC-009 - Retry on Failure: failed task with maxAttempts: 3 re-queues with exponential backoff and posts a retry-scheduled failure notification.
  • AC-010 - Startup Recovery: status = "running" tasks at bridge startup are transitioned to "timed_out" within 5s when requeueOnRestart = false.
  • AC-011 - Backward Compatibility: ask_agent, schedule, and all existing bridge tools are unaffected after deployment.
  • AC-012 - Callback Delivery: completed task with callbackChannel set results in a message posted to that channel containing task ID, target bot, duration, result summary (<= 500 chars), and a check_task usage hint.

Notes

Key Design Decisions

  • SQLite-only persistence: No external message broker (Redis, NATS, etc.). delegated_tasks table in the existing SQLite state DB. Transactions used for all state transitions to prevent partial writes.
  • Reuse executeEphemeralCall(): Task execution reuses the existing ephemeral session engine unchanged. Permission inheritance via buildEphemeralPermissionHandler() bounds grantTools to the caller's own permissions -- no new privilege escalation vector.
  • delegationAllow overlays canCall: Admins can permit sync Q&A (ask_agent) while restricting async delegation (delegate_task) separately. If delegationAllow is not configured for a bot pair, falls back to the existing canCall/canBeCalledBy check.
  • countChainTasks spans both tables: Chain budget enforcement queries both delegated_tasks AND agent_calls by chain_id to prevent budget bypass when mixing ask_agent and delegate_task in the same chain.
  • Dual-path result access: Channel callback (pushed on completion) and check_task polling (structured query) are always both active. Callback delivery failure is silent -- result remains available via polling.

Risks

  • withWorkspaceEnv() mutex scope (Risk 1 / T000): If the mutex in executeEphemeralCall() wraps the entire call duration (not just env setup), effective concurrency per bot is 1 regardless of maxConcurrentDelegations. The spike (T000) must measure this before Phase 2 ships. Default is tentatively 3 (narrow mutex assumed).
  • SQLite write contention (Risk 3): At maxConcurrentDelegations = 10, concurrent BEGIN IMMEDIATE transactions on task state transitions could cause brief lock contention. Mitigated by bounded concurrency ceiling and existing SQLite WAL mode if enabled.
  • Callback channel availability (Risk 4): If callbackChannel does not exist or the bot has no access to it, delivery fails silently. Result remains accessible via check_task. A WARN is logged.
  • Nested async chain depth semantics (Open Question 4): Async delegation consuming a depth slot identically to ask_agent is recommended but the interaction with non-linear async chains needs review during T005/T006 implementation.

Out of Scope (v1)

External message brokers, task cancellation (cancel_task), task DAGs, streaming results, cross-bridge delegation, session checkpointing, human-in-the-loop approval, admin UI, and delegated_tasks encryption at rest.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions