Skip to content

feat: cross-instance peer delegation (hierarchical swarm)#409

Open
furukama wants to merge 3 commits intomainfrom
feat/peer-delegation
Open

feat: cross-instance peer delegation (hierarchical swarm)#409
furukama wants to merge 3 commits intomainfrom
feat/peer-delegation

Conversation

@furukama
Copy link
Copy Markdown
Contributor

Summary

  • Adds a P2P delegation mechanism so an HQ HybridClaw instance can dispatch tasks to per-client instances over HTTP — bearer auth, agent allowlists, off by default.
  • Three new gateway endpoints: GET /.well-known/hybridclaw-peer.json (public agent card), POST /api/peer/delegate (inbound, bearer-auth from peer config), POST /api/peer/proxy (outbound, container → gateway → peer).
  • New container tool delegate_to_peer that returns the peer's final answer synchronously. Intentionally absent from the sub-agent allowlist to prevent unbounded fan-out.

Why

Roadmap item #9 (hierarchical swarm). Network architecture nobody else attempts; unlocks agency / multi-tenant deals where HQ holds orchestration and per-client instances hold the client's own credentials, files, and audit log.

Reviewed three prior-art designs (openfang A2A, hiclaw team-leader, deer-flow subagents) before settling on P2P with config-based peer lists. A central registry in ~/src/chat would be a new SPOF and a separate deployment to operate, while each instance already has a gateway HTTP server, HMAC bearer auth, and an audit chain — peer delegation reuses all of that.

Audit linkage

Both ends record the round trip:

  • Dispatcher: peer.delegate.sentpeer.delegate.acknowledged (with peerInstanceId, peerRunId)
  • Receiver: peer.delegate.receivedpeer.delegate.completed (with parentRunId, parentSessionId, parentInstanceId)

No shared hash chain — each instance keeps its own integrity. The taskId ties the two halves when replaying an incident.

Out of scope (explicit follow-ups)

  • Streaming over the peer link (synchronous request/response only for v1)
  • Approval forwarding — peer-side approval prompts surface as pendingApprovalSummary on the dispatcher; the dispatching agent must escalate to its own operator
  • Dynamic discovery / registry — peer list lives in ~/.hybridclaw/config.json

Files

  • New: src/peers/{peer-types,peer-registry,peer-client,peer-handlers}.ts
  • New: tests/peer-delegation.integration.test.ts (6 tests — agent card, missing/wrong/valid bearer, end-to-end proxy round trip, disabled-state 503)
  • New: docs/content/guides/peer-delegation.md (operator guide with HQ + client config snippets)
  • Modified: src/config/runtime-config.ts (peers schema + normalizer with dedup), src/config/config.ts (PEERS_CONFIG export), src/gateway/gateway-http-server.ts (route wiring), container/src/tools.ts (delegate_to_peer tool definition + dispatch)

Test plan

  • npm run lint (tsc --noUnusedLocals + console typecheck) — clean
  • npm run check (biome on src) — clean
  • npm run test:integration — all 64 tests pass including 6 new ones
  • npm run test:unit — same 9 pre-existing failures as main, no regressions
  • Manual smoke: stand up two gateways on localhost with the config snippets from docs/content/guides/peer-delegation.md, exercise delegate_to_peer from the dispatching TUI and confirm the result + audit entries on both sides

Benedikt Koehler added 2 commits April 26, 2026 11:49
Adds a P2P delegation mechanism so an HQ HybridClaw instance can dispatch
tasks to per-client instances over HTTP, with bearer auth, agent allowlists,
and audit linkage that ties parent and child runs across the boundary.

- Three new endpoints: GET /.well-known/hybridclaw-peer.json (public agent
  card), POST /api/peer/delegate (inbound, bearer-auth from peer config),
  POST /api/peer/proxy (outbound, container -> gateway -> peer).
- New container tool delegate_to_peer (synchronous; returns the peer's final
  answer as a tool result). Intentionally absent from the sub-agent allowlist
  to prevent unbounded fan-out.
- Audit chain on each side records peer.delegate.{sent,received,completed,
  acknowledged} with taskId + parentRunId / parentSessionId for forensic
  correlation.
- Off by default; integration test covers agent-card discovery, missing/wrong/
  valid bearer, end-to-end proxy round trip, and disabled-state 503.
- Documented in docs/content/guides/peer-delegation.md with config snippets
  for the agency-HQ -> per-client topology.
Use local variables to satisfy biome's noNonNullAssertion lint after server
creation, instead of asserting the module-scoped variables are non-null.
Copilot AI review requested due to automatic review settings April 26, 2026 09:50
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds cross-instance “peer delegation” so one HybridClaw gateway can delegate tasks to another over HTTP using configured peers (bearer tokens + allowlists), including a container tool entrypoint and audit linkage.

Changes:

  • Introduces peer delegation types, registry helpers, HTTP handlers, and outbound client logic under src/peers/.
  • Wires new peer endpoints into the gateway HTTP server and adds runtime config normalization + exported PEERS_CONFIG.
  • Adds delegate_to_peer container tool, integration tests, and an operator guide.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
tests/peer-delegation.integration.test.ts End-to-end integration tests for agent card, auth, inbound delegate, and proxy flow
src/peers/peer-types.ts Defines peer config + request/response types and the public agent card schema
src/peers/peer-registry.ts Reads runtime peer config, matches inbound tokens, enforces agent allowlists
src/peers/peer-handlers.ts Implements /.well-known agent card, inbound delegation, and outbound proxy endpoints
src/peers/peer-client.ts Implements outbound HTTP calls to peers (agent card + delegate)
src/gateway/gateway-http-server.ts Adds route wiring for peer endpoints (and an additional /api/peers route)
src/config/runtime-config.ts Adds peers runtime schema + normalization/deduping
src/config/config.ts Exports and applies PEERS_CONFIG from runtime config
docs/content/guides/peer-delegation.md Operator documentation for configuring and using peer delegation
container/src/tools.ts Adds delegate_to_peer tool and gateway proxy dispatch + response formatting

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread container/src/tools.ts Outdated
}

if (pendingApprovalSummary) {
return `Peer ${peerInstanceId || 'delegation'} paused for approval (${pendingApprovalSummary}). Surface this to the operator; the peer cannot prompt our user.`;
Copy link

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

formatPeerDelegateResponse() treats pendingApprovalSummary as a successful (non-failing) outcome, but the tool description and docs say approval-gated peer work should surface as a failure that the dispatcher must escalate. Align behavior by failing the tool when pendingApprovalSummary is present (or ensure the gateway returns a non-success status for approval-required delegations).

Suggested change
return `Peer ${peerInstanceId || 'delegation'} paused for approval (${pendingApprovalSummary}). Surface this to the operator; the peer cannot prompt our user.`;
return failTool(
`Peer delegation requires approval${peerInstanceId ? ` on ${peerInstanceId}` : ''}: ${pendingApprovalSummary}. Surface this to the operator; the peer cannot prompt our user.`,
);

Copilot uses AI. Check for mistakes.
Comment thread docs/content/guides/peer-delegation.md Outdated
Comment on lines +140 to +141
- **No approval forwarding**: peer-side approval prompts surface as failures
on the dispatcher.
Copy link

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section says peer-side approval prompts “surface as failures on the dispatcher”, but the current delegate_to_peer tool formatting returns a non-error message when pendingApprovalSummary is present. Update either the docs or the implementation so operators get consistent behavior.

Suggested change
- **No approval forwarding**: peer-side approval prompts surface as failures
on the dispatcher.
- **No approval forwarding**: peer-side approval prompts are returned to the
dispatcher as a pending-approval result (not a forwarded interactive prompt),
so the approval must be completed on the peer side before retrying or
continuing.

Copilot uses AI. Check for mistakes.
Comment on lines +152 to +156
const sessionId = buildPeerSessionId(inbound.id, body.taskId);
const peerRunId = createAuditRunId('peer');
const agentId = (body.agentId || '').trim() || DEFAULT_AGENT_ID;

safeAuditAppend({
Copy link

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isPeerDelegateRequestBody only validates the required fields, but later code assumes optional fields like agentId/model are strings (e.g. calling .trim() on them). A request with agentId: 123 will pass validation and then throw at runtime. Tighten validation for optional fields (or coerce with String(...)) so the handler reliably returns 400 instead of crashing.

Copilot uses AI. Check for mistakes.
Comment thread src/peers/peer-handlers.ts Outdated
function buildPeerSessionId(peerInstanceLabel: string, taskId: string): string {
const safeLabel = peerInstanceLabel.replace(/[^a-zA-Z0-9_-]/g, '_') || 'peer';
const safeTaskId = taskId.replace(/[^a-zA-Z0-9_-]/g, '_').slice(0, 32);
return `peer:${safeLabel}:${safeTaskId}`;
Copy link

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When taskId contains only characters that get replaced by the sanitization regex, safeTaskId becomes an empty string and many delegations can collapse into the same sessionId (peer:<label>:), mixing audit/session history. Add a fallback when the sanitized task id is empty (e.g. use a UUID/hash, or the original taskId truncated after encoding).

Suggested change
return `peer:${safeLabel}:${safeTaskId}`;
const fallbackTaskId =
encodeURIComponent(taskId).replace(/%/g, '_').slice(0, 32) || randomUUID();
return `peer:${safeLabel}:${safeTaskId || fallbackTaskId}`;

Copilot uses AI. Check for mistakes.
Comment on lines +194 to +198
const pendingApprovalSummary = result.pendingApproval
? buildPendingApprovalSummary(result.pendingApproval)
: null;

response = {
Copy link

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

handleGatewayMessage() returns status: 'success' even when pendingApproval is present, but this handler forwards status: result.status unchanged while also setting pendingApprovalSummary. That makes approval-gated delegations look successful to callers. Consider mapping pendingApproval to status: 'rejected' (and typically result: null) so dispatchers can treat it as blocked work consistently with the docs/tooling.

Copilot uses AI. Check for mistakes.
Comment thread src/peers/peer-client.ts
Comment on lines +124 to +128
const timeoutHandle = setTimeout(() => controller.abort(), timeoutMs);
if (options.signal) {
if (options.signal.aborted) controller.abort();
else options.signal.addEventListener('abort', () => controller.abort());
}
Copy link

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

options.signal.addEventListener('abort', ...) is registered without { once: true } and is never removed. If callers reuse an AbortSignal across many delegations, this can accumulate listeners unnecessarily. Use { once: true } and/or remove the listener after the request completes (similar to patterns elsewhere in the repo).

Copilot uses AI. Check for mistakes.
// Because PEERS_CONFIG was last set to dispatching (no inboundTokens),
// the receiving handler will reject. We verify that 502 surfaces correctly,
// and then re-run with receiving config to verify the success path.
expect([200, 502]).toContain(proxyResponse.status);
Copy link

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This expectation is overly permissive: after withDispatchingConfig(), the receiving handler will always see inboundTokens: [] in this in-process test and should reject the delegated call, so /api/peer/proxy should deterministically surface 502 here. Allowing 200 can hide a regression where auth is accidentally bypassed.

Suggested change
expect([200, 502]).toContain(proxyResponse.status);
expect(proxyResponse.status).toBe(502);

Copilot uses AI. Check for mistakes.
Comment thread src/gateway/gateway-http-server.ts Outdated
Comment on lines +3724 to +3726
if (pathname === '/api/peers' && method === 'GET') {
sendJson(res, 200, buildPeerAgentCard());
return;
Copy link

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description lists three new gateway endpoints, but this change also introduces GET /api/peers. Either document this additional endpoint (and its auth expectations) or remove it to avoid an undocumented surface area that duplicates the public agent card.

Copilot uses AI. Check for mistakes.
Address PR #409 review feedback:

- Map peer-side pending approvals to status:'rejected' on the wire (with
  result:null and pendingApprovalSummary populated), so the dispatching agent
  treats approval-gated work as blocked instead of silently succeeding. The
  container tool's formatter loses its now-unreachable "paused" branch.
- Tighten request body validation: reject non-string optional fields
  (agentId, model, parentRunId, parentSessionId) and non-finite timeoutMs
  with HTTP 400 instead of crashing inside .trim() at runtime.
- Fall back to a fresh UUID when a caller-supplied taskId sanitizes to an
  empty string, so two delegations can never collide on the same session id.
- AbortSignal listener in peer-client now registers with { once: true } and
  is removed on settle, preventing accumulation across reused signals.
- Tighten the end-to-end proxy test from expect([200, 502]).toContain to
  expect(...).toBe(502); the in-process shared PEERS_CONFIG makes the receiver
  deterministically reject the first call, and locking the status keeps an
  auth-bypass regression from sneaking past.
- Remove undocumented GET /api/peers endpoint that duplicated the public
  /.well-known/hybridclaw-peer.json agent card.
- Add two tests pinning the new behavior: optional-field validation and the
  pendingApproval -> 'rejected' mapping.
- Update docs/content/guides/peer-delegation.md to describe the rejected wire
  shape so operators see consistent behavior between docs and the tool output.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants