Skip to content

feat: add multi-claw instance creation flow and health updates#98

Open
meka-ichi wants to merge 25 commits intodaggerhashimoto:masterfrom
meka-ichi:multi-claw
Open

feat: add multi-claw instance creation flow and health updates#98
meka-ichi wants to merge 25 commits intodaggerhashimoto:masterfrom
meka-ichi:multi-claw

Conversation

@meka-ichi
Copy link
Copy Markdown

@meka-ichi meka-ichi commented Mar 6, 2026

This draft PR is opened on behalf of @tinco.

⚠️ This change has not been code reviewed yet.

Scope analyzed (branch point → now)

  • Branch point: 12cc93a (merge-base with origin/master)
  • Commits in scope: 21
  • Files changed: 34
  • Diff size: +4886 / -86

What this PR delivers

1) Multi-claw runtime image + local orchestration groundwork

  • Adds Dockerfile.multiclaw to run OpenClaw gateway + Nerve in a containerized multi-instance model.
  • Adds startup tooling (scripts/start-multiclaw.sh) and docs/research artifacts for the deployment model.

2) Docker-backed instance lifecycle management (server)

Introduces a full local instance management stack:

  • server/lib/docker-instances.ts
    • Discovers candidate containers and filters to Nerve-exposing OpenClaw-like instances.
    • Resolves published Nerve/Gateway ports.
    • Extracts/derives gateway token via env, mounted config, or runtime fallback.
    • Creates instances with isolated state dirs and copied allowlisted credentials/config.
    • Supports stop/remove flows (including managed state-dir cleanup).
    • Computes per-instance health (ready|initializing|unreachable, gateway reachability).
  • server/routes/instances.ts
    • New API surface:
      • GET /api/instances
      • GET /api/instances/credentials
      • POST /api/instances
      • POST /api/instances/:id/stop
      • DELETE /api/instances/:id
      • GET /api/instances/:id/token
      • ALL /api/instances/:id/proxy (+ wildcard)
    • Includes input validation, allowlist enforcement, and mapped error semantics.

3) Centralized instance-aware API routing + proxy hardening

  • Adds routing primitives in server/lib/instance-routing.ts and middleware in server/middleware/instance-routing.ts.
  • Ensures only eligible API paths are instance-routed; master-owned endpoints remain pinned.
  • server/lib/instance-proxy.ts adds guardrails:
    • Blocks traversal/scheme/protocol-relative unsafe paths.
    • Prevents proxying to master-only endpoints.
    • Restricts forwarded request/response headers.
    • Removes stale content-encoding forwarding to avoid client decode failures.

4) WebSocket instance routing correctness

  • server/lib/ws-proxy.ts now supports instance-specific WS targets by:
    • Resolving instance gateway port dynamically.
    • Overriding auth token per selected instance.
    • Using loopback origin for containerized gateway allowlists.
    • Disabling device identity injection for instance-routed WS to avoid pairing/identity mismatch issues.
  • Also keeps existing reconnect/challenge handling and restricted-method relay behavior.

5) Frontend multi-instance UX + request routing

  • Adds src/contexts/InstanceContext.tsx to own discovery, selection, and global fetch wrapping.
  • Adds src/lib/apiRouting.ts:
    • Appends instanceId query param for routed API endpoints.
    • Adds X-Instance-Id header only where applicable.
    • Forces cache: no-store for instance-routed fetches to prevent stale cross-instance data.
  • Adds CreateInstanceDialog and expands session UI (SessionList) for create/select/manage flows.
  • Updates file browser/session/dashboard/auth/websocket hooks to respect active instance.
  • Includes fixups for stale state on instance switch (file tree remount/state isolation, immediate selection ref sync, reconnect-loop mitigation).

6) Health/readiness contract improvements

  • server/routes/health.ts now exposes gateway probe state in:
    • /health
    • /healthcheck with explicit readiness shape (ready, status, gateway).

7) Documentation and project tracking

  • Updates API docs (docs/API.md) and README.
  • Adds Multi-Claw project tracking docs (GOALS.md, PROGRESS.md) and research notes.

Test coverage added/updated

  • New high-value test suites:
    • server/lib/docker-instances.test.ts
    • server/routes/instances.test.ts
    • server/middleware/instance-routing.test.ts
    • src/lib/apiRouting.test.ts
  • Updates existing tests:
    • server/lib/ws-proxy.test.ts
    • src/hooks/useWebSocket.test.ts
  • Commit history also includes targeted stabilizations for proxy guardrails and routing behavior.

Behavioral outcome

This PR evolves multi-claw from basic scaffolding into an end-to-end instance model where users can:

  • discover local Nerve-capable containers,
  • create/stop/delete instances,
  • select an active instance in UI,
  • route HTTP + WS traffic to that instance safely,
  • and avoid several stale-state/caching/reconnect pitfalls discovered during integration.

Risk notes / review focus

  • Large surface area across server routing, WS transport, and frontend state.
  • Most important review hotspots:
    1. credential/config copy boundaries in instance creation,
    2. proxy allow/deny routing rules,
    3. WS auth/origin behavior for instance mode,
    4. UI consistency when switching instances under load.

Summary by CodeRabbit

  • New Features

    • Multi-instance Docker deployment and startup script for local instances
    • UI: instance panel to create/start/stop/remove instances, instance switching, and per-instance file-tree state
    • Proxying: context-aware API and WebSocket routing to selected instances
    • Health: added /healthcheck readiness endpoint
  • Documentation

    • Docker deployment guide, goals/progress/research notes, and expanded API docs for instance management
  • Chores

    • Added Dockerignore to exclude unnecessary files from builds
  • Tests

    • Large suite covering instance lifecycle, routing, proxying, and WS flows

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 6, 2026

📝 Walkthrough

Walkthrough

Adds Multi-Claw local Docker instance support: build/runtime Docker assets and startup script, comprehensive server-side instance management, proxying and routing, extensive API/routes/tests, frontend instance context/UI integration, per-instance fetch/WS routing, and documentation and progress artifacts.

Changes

Cohort / File(s) Summary
Docker & Bootstrap
\.dockerignore, Dockerfile.multiclaw, scripts/start-multiclaw.sh
Adds Docker ignore, multi-stage Dockerfile for build/runtime, and start script that bootstraps gateway config, token handling, process orchestration, and graceful shutdown.
Project Docs & Planning
GOALS.md, PROGRESS.md, README.md, docs/API.md, docs/research-aramaki-multiclaw.md
New goal/progress docs, README Docker usage, expanded API docs (instance discovery/proxy/targeting), and research notes on deterministic container bootstrapping.
Docker instance runtime + tests
server/lib/docker-instances.ts, server/lib/docker-instances.test.ts
New runtime module for discovering/creating/managing Docker-backed instances, credential/template handling, token resolution, health probing, port mapping; large unit/integration-like test suite.
Proxying & Routing core
server/lib/instance-proxy.ts, server/lib/instance-routing.ts
Implements safe proxying (path validation, header/response filtering), error mapping, and instance-id routing helpers/constants.
Server routes, middleware & tests
server/routes/instances.ts, server/routes/instances.test.ts, server/middleware/instance-routing.ts, server/middleware/instance-routing.test.ts
New /api/instances REST surface (list/create/stop/remove/proxy/token), middleware to route eligible /api/* requests to instances, and comprehensive tests covering flows and error cases.
Health + WS proxy
server/routes/health.ts, server/routes/health.test.ts, server/lib/ws-proxy.ts, server/lib/ws-proxy.test.ts
Adds probeGateway and /healthcheck readiness endpoint; extends WebSocket proxy for instance-based routing, token override, loopback gateway targets, and updated connect/token injection with tests.
App wiring & server setup
server/app.ts
Mounts instance routing middleware and instances routes; adds CORS header allowance for X-Instance-Id.
Frontend: Instance context & Gateway
src/contexts/InstanceContext.tsx, src/contexts/GatewayContext.tsx
New InstanceProvider/hooks exposing discovered instances, active selection, polling and route helper; GatewayContext reconnects when active instance changes using saved credentials.
Frontend: UI components & dialogs
src/features/sessions/CreateInstanceDialog.tsx, src/features/sessions/SessionList.tsx, src/features/sessions/SessionList.test.tsx
CreateInstanceDialog for provisioning docker instances; SessionList extended with instances panel, create/refresh, stop/remove actions and tests for stop behavior.
Frontend: File tree & hooks
src/features/file-browser/hooks/useFileTree.ts, src/features/file-browser/FileTreePanel.tsx
Per-instance expanded-path persistence, debug state, remount FileTreePanel on instance change, and debug UI in header.
Frontend: API routing & fetch helpers
src/lib/apiRouting.ts, src/lib/apiRouting.test.ts, src/hooks/useServerEvents.ts, src/hooks/useDashboardData.ts, src/hooks/useWebSocket.ts, src/hooks/useWebSocket.test.ts
Centralizes routeApiPath and addInstanceHeaderToFetch (INSTANCE_ID header/query param), SSE/events and dashboard polling tied to active instance, WebSocket hook accepts activeInstanceId and appends instanceId query param with tests.
Server routes: instances export
server/routes/instances.ts
Exports Hono app handling instance management and proxy endpoints with mapped error handling.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant UI
    participant Server
    participant Docker
    participant OpenClaw

    User->>UI: Open CreateInstanceDialog, submit name+configs
    UI->>Server: POST /api/instances (create)
    Server->>Docker: docker run (selected multiclaw image, mounts/env)
    Docker->>OpenClaw: Start gateway + nerve inside container
    Server->>Server: Prepare instance workspace, copy creds, write openclaw.json
    Server->>OpenClaw: Probe health /token (via published ports)
    OpenClaw-->>Server: Ready / token
    Server-->>UI: 201 Created (instanceId)
    UI->>UI: refresh instances, set activeInstanceId
Loading
sequenceDiagram
    participant Client
    participant Middleware
    participant InstanceProxy
    participant LocalInstance

    Client->>Middleware: GET /api/files/read?instanceId=inst-1
    Middleware->>Middleware: validate instanceId & path eligibility
    Middleware->>InstanceProxy: proxyToInstance(instanceId, path, headers, body)
    InstanceProxy->>LocalInstance: fetch http://127.0.0.1:<published>/api/files/read (forwarded headers)
    LocalInstance-->>InstanceProxy: response (filtered headers)
    InstanceProxy-->>Middleware: proxied response
    Middleware-->>Client: return proxied response
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

Poem

🐰 I hopped into containers, built from a clue,
Tokens and routes stitched tidy and true,
Instances bloom where Docker seeds lay,
Proxies and panels guide paths on their way,
A nimble rabbit cheers: "Multi-claw today!"

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description comprehensively covers scope, deliverables, test coverage, behavioral outcomes, and risk areas, but does not follow the required template structure with explicit 'What', 'Why', 'How', 'Type of Change', and 'Checklist' sections. Restructure the description to match the template: add explicit 'What', 'Why', 'How' sections; include 'Type of Change' checkbox selection; add 'Checklist' verifying lint/build/test and addressing reviewer-flagged issues before merge.
Docstring Coverage ⚠️ Warning Docstring coverage is 6.72% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main additions: multi-claw instance creation flow and health contract improvements, accurately reflecting the PR's primary deliverables.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

You can disable the changed files summary in the walkthrough.

Disable the reviews.changed_files_summary setting to disable the changed files summary in the walkthrough.

Codex and others added 22 commits March 6, 2026 09:16
@meka-ichi
Copy link
Copy Markdown
Author

Summary

Reviewed PR #98 against origin/master (this repo doesn’t have origin/main). The multi-instance architecture and coverage are strong, but I found one security-impacting proxy regression and one UI behavior regression that should be fixed before merge.

Major Issues

  • Proxy strips security-relevant response headers from instance APIs
    In server/lib/instance-proxy.ts:8, PROXY_RESPONSE_HEADERS only forwards a small allowlist and drops Content-Disposition.
    This conflicts with existing SVG safety behavior in server/routes/files.ts:73, where SVGs are forced to download (attachment) to mitigate stored-XSS risk.
    When routed through instance proxy (/api/files/* is proxy-eligible), that header is lost, so browser behavior and security posture can change.

Minor Issues

  • “Stop instance” button is incorrectly disabled for running instances
    In src/features/sessions/SessionList.tsx:256, status prefers instance.availability (ready|initializing|unreachable) over state, then uses /running/i on that value at line 257.
    Result: truly running instances with availability: "ready" are treated as not running, disabling stop action at line 283.

Suggested Fixes

  1. In instance-proxy, preserve needed safe response headers from upstream, especially Content-Disposition (and likely Content-Length for correctness), while still excluding problematic Content-Encoding.
  2. In SessionList, derive actionability from container runtime state (instance.state === 'running' || 'restarting') instead of availability.
  3. Add regression tests:
    • Proxy header passthrough test asserting Content-Disposition: attachment survives instance proxy for SVG responses.
    • SessionList unit test ensuring stop button is enabled when state: "running" even if availability: "ready".

@meka-ichi
Copy link
Copy Markdown
Author

Follow-up Summary

Second pass over origin/master...HEAD found strong coverage around Docker instance routes/proxying, but one functional risk remains in WS instance auth, plus a few checklist/test evidence gaps before merge.

Blockers

  • Fail-fast missing when instance token lookup fails for WS routing
    In server/lib/ws-proxy.ts:143, instance mode sets authTokenOverride = tokenResult?.token || null and still proceeds. If token discovery fails, proxy falls back to client/master token and attempts connect anyway, which can produce auth-fail loops instead of a deterministic close/error for the selected instance.
    Action: If requestedInstanceId is set and token is missing, close with explicit reason (e.g. 1013 + Instance token unavailable) and stop relay setup.

Test/CI Gaps

  • Add WS regression test for “instance selected + token missing” path in server/lib/ws-proxy.test.ts.
  • Add coverage for new GET /healthcheck behavior in server/routes/health.test.ts and docs mention in docs/API.md.
  • No frontend tests yet for instance UX state flows (InstanceProvider fetch patching / instance switch effects in SessionList + CreateInstanceDialog).

Checklist Readiness

  • lint/build/test: Not verified in this review environment (tooling/deps unavailable here); rely on PR CI status.
  • New feature tests: Partially ready (good backend additions; frontend instance flows still thin).
  • Screenshots/screen recording for UI changes: Not evidenced in diff.
  • Docs: Mostly updated (README.md, docs/API.md), but GET /healthcheck contract should be documented.

Recommendation

Request changes before merge: implement WS token fail-fast + regression test, document/test healthcheck, and attach UI screenshot/recording + green CI evidence for lint/build/test.

@meka-ichi meka-ichi marked this pull request as ready for review March 7, 2026 21:46
@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Mar 7, 2026

Greptile Summary

This PR introduces a full multi-claw instance management stack, evolving the application from single-instance to a multi-container model. It adds Docker lifecycle management (create/stop/delete), centralized instance-aware HTTP and WebSocket routing, a credential copy mechanism for bootstrapping new instances, and a frontend context (InstanceProvider) for discovery, selection, and transparent request routing to the active instance.

Key concerns identified:

  • Biased gateway token generation (server/lib/docker-instances.ts, line 940): bytes[i] % 62 produces a non-uniform distribution because 256 is not divisible by 62. The first 8 alphabet characters are slightly over-represented. For a security-sensitive gateway auth token, a 64-char alphabet or rejection sampling should be used.
  • Unescaped dotenv value serialization (server/lib/docker-instances.ts, line 1025): Credential values are written to the .env file without quoting or escaping. Values containing embedded newlines will be silently truncated or corrupted on the next read.
  • docker exec side effect on a read-only endpoint (server/lib/docker-instances.ts, line 683): GET /api/instances/:id/token can silently trigger docker exec <containerId> node -e <script> on a running container when the env/config token fallbacks fail. This is an unexpected privilege for a GET endpoint and warrants either documentation or an explicit opt-in mechanism.
  • Global fetch double-wrap risk (src/contexts/InstanceContext.tsx, line 71): The monkey-patch captures originalFetch at mount time. In React Strict Mode (or on remount), a second mount captures the already-patched function, creating a permanently nested wrapper after cleanup.
  • Duplicated prefix lists (server/lib/instance-routing.ts and src/lib/apiRouting.ts): MASTER_ONLY_API_PREFIXES and INSTANCE_PROXYABLE_API_PREFIXES are copied verbatim between server and client with no shared source-of-truth. A future route addition in only one place silently breaks routing for that endpoint.
  • NERVE_ALLOW_INSECURE=true hardcoded (server/lib/docker-instances.ts, line 1046): All created instances inherit this flag unconditionally, which disables gateway security checks. This is appropriate for localhost-only use, but should be conditioned on configuration or at least explicitly documented.

Confidence Score: 2/5

  • Not safe to merge without addressing the token generation bias, dotenv escaping bug, and the docker exec side effect on the token endpoint.
  • The PR is well-structured with good test coverage and thorough proxy/routing guardrails. However, two correctness bugs (biased token generation for a security credential, and unescaped dotenv writes that can silently corrupt copied credentials) and one surprising privilege escalation (GET endpoint triggering docker exec) are significant enough to block merge. The global fetch double-wrap is a latent bug that could surface in development or on remounts. The duplicated prefix lists create an ongoing maintenance hazard.
  • Pay close attention to server/lib/docker-instances.ts (token generation, dotenv writing, docker exec side effect) and src/contexts/InstanceContext.tsx (fetch monkey-patching lifecycle).

Important Files Changed

Filename Overview
server/lib/docker-instances.ts Core instance lifecycle logic — has biased gateway token generation (modulo 62 on 256-byte values), unescaped dotenv value writing that can corrupt credentials, and a side-effectful docker exec triggered by the read-only token endpoint.
server/lib/instance-proxy.ts Proxy validation and header filtering look solid; traversal, scheme, and double-decode checks are thorough. Intentional omission of content-encoding forwarding is well-documented.
server/routes/instances.ts Route definitions and input validation are thorough; allowlist enforcement is correct for env/JSON keys. GET /api/instances/:id/token silently triggers docker exec on running containers as a side effect of a read-only call.
server/lib/instance-routing.ts Routing predicates are clean and well-tested, but MASTER_ONLY_API_PREFIXES and INSTANCE_PROXYABLE_API_PREFIXES are duplicated in the frontend with no shared source-of-truth.
server/lib/ws-proxy.ts Instance-aware WS routing looks correct; auth token override, loopback origin, and device-identity disable for instance paths are all handled. Keepalive and reconnect logic preserved from the original.
src/contexts/InstanceContext.tsx Global fetch monkey-patch can double-wrap in React Strict Mode if InstanceProvider ever remounts — the second mount captures the already-patched function as originalFetch. Instance polling and ref-sync logic are otherwise sound.
src/lib/apiRouting.ts Client-side routing logic is correct; no-store cache override for instance-routed fetches is a good guard against cross-instance stale data. Prefix lists are duplicated from the server with no sync mechanism.
server/middleware/instance-routing.ts Middleware correctly gates on isProxyEligibleApiPath before proxying and strips the instanceId query param before forwarding. No issues found.
server/routes/health.ts Health endpoints now expose gateway probe state with a clear readiness shape. Timeout on the gateway probe (3 s) is reasonable. No issues found.
server/app.ts Middleware stack ordering is correct — authMiddleware runs before instanceRoutingMiddleware, so instance proxy calls are authenticated. New routes are mounted at the expected position.

Sequence Diagram

sequenceDiagram
    participant UI as Browser UI
    participant IC as InstanceContext
    participant Nerve as Nerve Server
    participant Docker as Docker Daemon
    participant Inst as Instance Container

    Note over UI,IC: Instance Discovery (every 5s)
    UI->>Nerve: GET /api/instances
    Nerve->>Docker: docker ps -a + docker inspect
    Docker-->>Nerve: Container metadata
    Nerve-->>UI: LocalInstanceSummary[]

    Note over UI,Nerve: Instance Creation
    UI->>Nerve: POST /api/instances
    Nerve->>Nerve: prepareInstanceWorkspace()<br/>writes .openclaw state dir,<br/>openclaw.json with generated gateway credential
    Nerve->>Docker: docker run -d -v stateDir:/home/node/.openclaw
    Docker-->>Nerve: containerId
    Nerve-->>UI: CreateDockerInstanceResult

    Note over UI,IC: Instance-Routed API Call
    UI->>IC: fetch('/api/sessions')
    IC->>IC: addInstanceHeaderToFetch()<br/>adds X-Instance-Id header
    IC->>Nerve: GET /api/sessions (X-Instance-Id: id)
    Nerve->>Nerve: instanceRoutingMiddleware<br/>resolveRequestedInstanceId()
    Nerve->>Docker: docker inspect id
    Docker-->>Nerve: published ports
    Nerve->>Inst: GET http://127.0.0.1:nervePort/api/sessions
    Inst-->>Nerve: upstream response
    Nerve-->>UI: filtered proxied response

    Note over UI,Inst: Instance-Routed WebSocket
    UI->>Nerve: WS /ws?instanceId=id
    Nerve->>Docker: inspect + resolve gateway port + get credential
    Docker-->>Nerve: gatewayPort + gateway credential
    Nerve->>Inst: WS ws://127.0.0.1:gatewayPort/ws
    Inst-->>Nerve: connect.challenge nonce
    Nerve->>Inst: connect with instance credential (no device identity)
    Inst-->>UI: relay messages bidirectionally
Loading

Last reviewed commit: c2a2886

Comment on lines +940 to +948
function generateInstanceGatewayToken(): string {
const alphabet = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789';
const bytes = randomBytes(32);
let token = '';
for (let i = 0; i < bytes.length; i += 1) {
token += alphabet[bytes[i] % alphabet.length];
}
return token;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Biased modulo reduction in gateway token generation

bytes[i] % alphabet.length introduces a statistical bias because 256 is not evenly divisible by 62 (256 = 62 × 4 + 8). The first 8 characters in the alphabet (AH) are selected with probability 5/256 ≈ 1.95% instead of the uniform 1/62 ≈ 1.61%. While the bias is small (~0.4%), this is a security-sensitive token used for gateway authentication, and best practice is to avoid any non-uniform distribution.

Switching to a 64-character alphabet (e.g., base64url), or using rejection sampling, eliminates the bias:

function generateInstanceGatewayToken(): string {
  // base64url alphabet has 64 chars — 256 / 64 = 4 exactly, no bias
  const alphabet = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_';
  const bytes = randomBytes(32);
  let token = '';
  for (let i = 0; i < bytes.length; i += 1) {
    token += alphabet[bytes[i] % alphabet.length];
  }
  return token;
}

Comment on lines +683 to +692
async function resolveRuntimeGatewayTokenViaExec(containerId: string): Promise<string | null> {
try {
const script = "fetch('http://127.0.0.1:3080/api/connect-defaults').then(r=>r.json()).then(j=>process.stdout.write((j&&typeof j.token==='string')?j.token:''))";
const stdout = await runDocker(['exec', containerId, 'node', '-e', script]);
const token = stdout.trim();
return token || null;
} catch {
return null;
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GET /api/instances/:id/token silently triggers docker exec as a side effect

resolveRuntimeGatewayTokenViaExec is called from resolveInstanceToken, which is called from getInstanceToken, which backs the GET /api/instances/:id/token endpoint. This means issuing a simple read-only token lookup can cause the server to execute a Node.js script inside a running container via docker exec.

This is a surprising side effect for a GET endpoint and poses two concerns:

  1. Unexpected privilege escalation scope: Any authenticated caller of GET /api/instances/:id/token implicitly gains the ability to trigger code execution in a matched container without the API signature suggesting this.
  2. Race condition on container ID: The containerId is obtained from inspect.Id after a preceding inspectLocalOpenClawInstance call. Between the inspect and the docker exec, the container could be stopped/replaced. Docker will reject the exec in that case, but the attempt still occurs.

Consider separating the runtime fallback into an explicit probe operation (e.g., a POST endpoint or a dedicated ?probe=true query parameter) so that callers can opt into the exec-based lookup intentionally.

Comment on lines +71 to +86
useEffect(() => {
const originalFetch = globalThis.fetch.bind(globalThis);
globalThis.fetch = ((input: RequestInfo | URL, init?: RequestInit) => {
const routed = addInstanceHeaderToFetch(
input,
init,
activeInstanceIdRef.current,
window.location.origin,
);
return originalFetch(routed.input, routed.init);
}) as typeof fetch;

return () => {
globalThis.fetch = originalFetch;
};
}, []);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Global fetch monkey-patch can double-wrap in React Strict Mode

The useEffect with [] dependencies captures originalFetch = globalThis.fetch.bind(globalThis) at mount time. In React 18 Strict Mode (development), effects are intentionally run twice: mounted → cleanup → remounted. On the second mount, globalThis.fetch is already the patched wrapper from the first mount (cleanup restored it, but only if the return cleanup ran correctly). However, if there is any async gap between cleanup and re-run, or if InstanceProvider is used higher than Strict Mode boundaries, the original reference could already be a wrapper.

More practically: if InstanceProvider is ever unmounted and remounted (e.g., session expiry, hot-reload in development), originalFetch in the second lifecycle captures the patched version from the first lifecycle, creating a permanently nested wrapper. After the second unmount's cleanup, globalThis.fetch is restored to the first lifecycle's patched version rather than the browser native.

A safer pattern uses a stable outer ref:

  const nativeFetchRef = useRef<typeof fetch | null>(null);

  useEffect(() => {
    // Only capture the native fetch once, on first mount
    if (!nativeFetchRef.current) {
      nativeFetchRef.current = globalThis.fetch.bind(globalThis);
    }
    const originalFetch = nativeFetchRef.current;
    globalThis.fetch = ((input, init) => {
      const routed = addInstanceHeaderToFetch(input, init, activeInstanceIdRef.current, window.location.origin);
      return originalFetch(routed.input, routed.init);
    }) as typeof fetch;
    return () => {
      globalThis.fetch = originalFetch;
    };
  }, []);

const copiedCredentialKeys = [...selectedConfigurations.keys(), ...selectedProfileKeys];
const preparedStateDir = prepareInstanceWorkspace(input.name, effectiveCredentialKeys);

const args = ['run', '-d', '--name', input.name, '-p', '0:3080', '-p', '0:3181', '-v', `${preparedStateDir}:/home/node/.openclaw`, '-e', 'NERVE_ALLOW_INSECURE=true'];
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NERVE_ALLOW_INSECURE=true is hardcoded for every created instance

Every instance is started with -e NERVE_ALLOW_INSECURE=true regardless of context. This disables certain security checks in the Nerve gateway and becomes permanently baked into all managed containers. If a deployment ever moves these containers from localhost to a networked environment, the insecure flag remains silently active.

Consider making this conditional on a server-side config flag or at least documenting in a comment why it's unconditionally set. It could also be exposed as a CreateDockerInstanceInput option so operators can opt in explicitly.

Comment on lines +6 to +34
export const MASTER_ONLY_API_PREFIXES = [
'/api/instances',
'/api/auth',
] as const;

export const INSTANCE_PROXYABLE_API_PREFIXES = [
'/api/agentlog',
'/api/channels',
'/api/claude-code-limits',
'/api/codex-limits',
'/api/connect-defaults',
'/api/crons',
'/api/events',
'/api/files',
'/api/gateway',
'/api/keys',
'/api/language',
'/api/memories',
'/api/server-info',
'/api/sessions',
'/api/skills',
'/api/tokens',
'/api/transcribe',
'/api/tts',
'/api/version',
'/api/version/check',
'/api/voice-phrases',
'/api/workspace',
] as const;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicated prefix lists between server and client with no shared source-of-truth

MASTER_ONLY_API_PREFIXES and INSTANCE_PROXYABLE_API_PREFIXES are defined here in server/lib/instance-routing.ts (lines 6–34) and duplicated verbatim in src/lib/apiRouting.ts (lines 1–29). The two lists are currently identical, but there is no compile-time or test-time enforcement that they stay in sync.

If a new route is added to the server's INSTANCE_PROXYABLE_API_PREFIXES without updating the client list (or vice versa), the frontend will silently fail to attach the X-Instance-Id header for that route, resulting in requests unexpectedly hitting the master rather than the selected instance — a subtle routing regression that could be hard to debug.

Consider extracting the shared constants into a single generated artifact (e.g., a JSON file read by both sides during build, or a server/lib/routing-constants.ts that the client imports via a workspace package) so that additions in one place automatically propagate to the other.

Comment on lines +1025 to +1029
if (finalEnvEntries.size > 0) {
const envContent = [...finalEnvEntries.entries()]
.map(([key, value]) => `${key}=${value}`)
.join('\n');
fs.writeFileSync(path.join(stateDir, '.env'), `${envContent}\n`, 'utf8');
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unescaped values can corrupt the written dotenv file

The dotenv file is written with no quoting or escaping of the value portion of each entry. If any credential value contains a newline character, the resulting file will have a raw newline mid-value. When the file is later read by parseDotEnvShallow, content after the embedded newline is treated as a separate line — either silently dropped or mis-parsed as a different entry, causing the credential to be silently truncated or corrupted.

The serialization block should wrap each value in double-quotes and escape any embedded double-quotes, backslashes, and newlines before writing, consistent with the quoting that parseDotEnvShallow already handles on the read side.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (16)
PROGRESS.md (1)

64-74: Give each validation section a unique heading.

The repeated ## Validation notes headers create duplicate anchors and already trigger MD024. Renaming them per milestone keeps the document navigable.

Also applies to: 102-109, 140-147

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@PROGRESS.md` around lines 64 - 74, Replace duplicate "## Validation notes"
headers with unique headings per section (for example "## Validation notes —
Milestone 1", "## Validation notes — Milestone 2", etc.) so each anchor is
unique and MD024 is avoided; update the three occurrences that currently use the
same "## Validation notes" heading (the block shown in the diff and the other
two similar sections) and keep the existing bullet content unchanged.
server/lib/instance-routing.ts (1)

1-4: Instance ID regex allows up to 128 characters.

The regex {0,127} after the first character allows 1-128 total characters. This is more permissive than Docker container name limits (typically 63 chars as used in CreateInstanceDialog). Consider aligning these limits for consistency, or document why they differ.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/lib/instance-routing.ts` around lines 1 - 4, The INSTANCE_ID_RE
currently allows 1–128 characters which exceeds the Docker/container name limit
used elsewhere (e.g., CreateInstanceDialog); update INSTANCE_ID_RE to enforce
the same max length as the UI by changing the quantifier so the total length is
1–63 characters (adjust the `{0,127}` to `{0,62}`) or, if 128 is intentional,
add a clear comment explaining the deliberate mismatch and reference
CreateInstanceDialog; ensure the change or comment is made at the INSTANCE_ID_RE
declaration so all validations remain consistent.
src/features/sessions/CreateInstanceDialog.tsx (1)

239-244: Validation message shown even when form is untouched.

Lines 239-241 display validationMessage (e.g., "Name is required.") immediately when the dialog opens with an empty name. This might feel intrusive for users who haven't interacted with the form yet. Consider showing validation only after the first blur or submit attempt.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/features/sessions/CreateInstanceDialog.tsx` around lines 239 - 244, The
validation message is shown immediately on open; modify CreateInstanceDialog to
only show validationMessage after the user has interacted or tried to submit:
add a local boolean state (e.g., touched or attemptedSubmit) and set it to true
on the name input's onBlur (and also set attemptedSubmit true in the submit
handler), then change the rendering condition from "validationMessage && !error"
to "validationMessage && !error && (touched || attemptedSubmit)". Update the
name input component to call the onBlur handler that flips touched true and
ensure the submit path sets attemptedSubmit so validation appears after submit.
Dockerfile.multiclaw (1)

21-24: Pin openclaw to a specific version for reproducibility.

The global npm install -g openclaw installs the latest version at build time, which could lead to inconsistent behavior across builds. Pin it explicitly to ensure consistency.

♻️ Proposed fix
 RUN apt-get update \
   && apt-get install -y --no-install-recommends ca-certificates bash git \
   && rm -rf /var/lib/apt/lists/* \
-  && npm install -g openclaw
+  && npm install -g openclaw@2026.3.2
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Dockerfile.multiclaw` around lines 21 - 24, The Dockerfile currently installs
openclaw without a version which leads to non-reproducible builds; update the
RUN that calls npm install -g openclaw to pin a specific version (e.g., npm
install -g openclaw@<version>) or add a build ARG like OPENCLAW_VERSION with a
sensible default and use npm install -g openclaw@$OPENCLAW_VERSION so builds are
deterministic; modify the RUN line containing npm install -g openclaw to
reference that pinned version/ARG and document/update any related build
instructions.
src/features/sessions/SessionList.test.tsx (1)

70-86: Test coverage is good, but consider strengthening the disabled button assertion.

The test correctly verifies the button is disabled when state: 'exited'. However, clicking a disabled button in the DOM typically doesn't fire click handlers, so line 84-85 may not be testing the actual guard logic—it's testing browser behavior.

Consider adding a test that verifies the component's internal logic doesn't call fetch even if somehow triggered, or simply remove the click attempt since expect(stopButton).toBeDisabled() already validates the intended behavior.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/features/sessions/SessionList.test.tsx` around lines 70 - 86, The test
asserts the stop button is disabled but then attempts a userEvent.click which
only verifies browser behavior, so either remove the click and associated
fetchMock assertion or strengthen the test to verify component logic: update the
test in SessionList.test.tsx to keep the renderList call and the
expect(stopButton).toBeDisabled() assertion, then either (A) delete the
userEvent.click(stopButton) and the expect(fetchMock).not.toHaveBeenCalled()
lines, or (B) instead of simulating a DOM click, directly call the component's
stop handler (or trigger the same action via the instance method used by the
Stop button) and assert fetchMock was not invoked; reference the stopButton
variable, renderList helper, and fetchMock when making the change.
src/features/file-browser/FileTreePanel.tsx (1)

576-578: Consider conditionalizing the debug status line for production.

The debug line showing inst=... and status is useful for development but may clutter the UI for end users. Consider hiding it behind an environment flag or removing it before the PR leaves draft status.

♻️ Optional: Conditionalize debug UI
-        <div className="mt-1 text-[9px] font-mono text-muted-foreground truncate" title={`${debug.lastPath} ${debug.error || ''}`}>
-          inst={debug.instanceId ? debug.instanceId.slice(0, 8) : 'master'} · {debug.ok ? `ok:${debug.status ?? '-'}` : `err:${debug.error || debug.status || 'unknown'}`}
-        </div>
+        {import.meta.env.DEV && (
+          <div className="mt-1 text-[9px] font-mono text-muted-foreground truncate" title={`${debug.lastPath} ${debug.error || ''}`}>
+            inst={debug.instanceId ? debug.instanceId.slice(0, 8) : 'master'} · {debug.ok ? `ok:${debug.status ?? '-'}` : `err:${debug.error || debug.status || 'unknown'}`}
+          </div>
+        )}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/features/file-browser/FileTreePanel.tsx` around lines 576 - 578, The
debug status line (the div rendering debug.lastPath, debug.instanceId,
debug.ok/status/error) should be hidden in production; wrap that debug rendering
in a conditional check (e.g., only render when a debug flag is true or when
NODE_ENV !== 'production' or when a prop like showDebug/isDraft is set) so end
users don't see the inst=... status; locate the div that references
debug.lastPath, debug.instanceId and debug.ok and gate its rendering behind the
chosen environment/feature flag.
src/hooks/useDashboardData.ts (1)

140-158: Consider clearing state on instance switch to avoid stale data flash.

When activeInstanceId changes, the effect restarts and sets memoriesLoading to true, but memories and tokenData retain their previous values until the fetch completes. Users may briefly see data from the previous instance.

♻️ Optional: Clear state on instance change
   // Initial fetch and polling (restarts when active instance changes)
   useEffect(() => {
     const controller = new AbortController();

     // Fetch immediately for current instance context.
     setMemoriesLoading(true);
+    setMemories([]);
+    setTokenData(null);
     refreshMemories(controller.signal);
     refreshTokens(controller.signal);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/hooks/useDashboardData.ts` around lines 140 - 158, When activeInstanceId
changes the effect restarts but previous memories/tokenData remain until fetch
completes, causing a stale flash; update the start of the useEffect (the block
that creates controller and calls refreshMemories/refreshTokens) to clear the
previous instance state: call setMemories([]) (or appropriate empty shape) and
reset tokenData (e.g., setTokenData(null) or []) and setTokensLoading(true)
alongside setMemoriesLoading(true) before invoking refreshMemories/refreshTokens
so the UI shows cleared loading state for the new instance; keep the existing
abort controller/interval logic unchanged and ensure you reference useEffect,
refreshMemories, refreshTokens, setMemoriesLoading, setMemories,
tokenData/setTokenData, and activeInstanceId when making the change.
src/contexts/GatewayContext.tsx (1)

146-163: Logic is correct; consider simplifying the credential retrieval for readability.

The reconnection logic correctly handles instance switches by disconnecting and reconnecting with saved credentials. The early return on line 148 prevents re-entry issues.

The IIFE on lines 151-155 works but is slightly harder to parse at a glance.

♻️ Optional: Simplify credential retrieval
-    const saved = lastConnectRef.current || (() => {
-      const loaded = loadConfig();
-      if (loaded.url && loaded.token) return { url: loaded.url as string, token: loaded.token as string };
-      return null;
-    })();
+    let saved = lastConnectRef.current;
+    if (!saved) {
+      const loaded = loadConfig();
+      if (loaded.url && loaded.token) {
+        saved = { url: loaded.url as string, token: loaded.token as string };
+      }
+    }
     if (!saved) return;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/contexts/GatewayContext.tsx` around lines 146 - 163, Replace the IIFE
credential retrieval with a clearer sequence: read lastConnectRef.current into a
variable (e.g., const last = lastConnectRef.current), if last exists use it as
saved, otherwise call loadConfig() into a variable (e.g., const loaded =
loadConfig()) and derive saved from loaded.url/loaded.token; keep the rest of
the effect logic using previousInstanceIdRef, connectionState, disconnect, and
connect unchanged so reconnect uses the same saved {url, token} structure.
src/contexts/InstanceContext.tsx (1)

71-86: Global fetch override pattern requires careful coordination.

The global fetch override is captured once on mount with an empty dependency array. This works correctly for this use case, but be aware that if other code also overrides globalThis.fetch, the order of mount/unmount could cause unexpected behavior. Consider documenting this constraint in a comment.

📝 Suggested documentation
   useEffect(() => {
+    // Note: We override globalThis.fetch once on mount to inject instance routing headers.
+    // This assumes no other code overrides fetch after this provider mounts.
+    // If multiple fetch overrides are needed, consider a different pattern (e.g., custom fetch wrapper).
     const originalFetch = globalThis.fetch.bind(globalThis);
     globalThis.fetch = ((input: RequestInfo | URL, init?: RequestInit) => {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/contexts/InstanceContext.tsx` around lines 71 - 86, The current useEffect
replaces globalThis.fetch on mount (and restores it on unmount) which can
conflict if other modules also override fetch; add a concise comment above this
useEffect explaining the coordination constraint: that originalFetch is captured
on mount, restoration happens on unmount, and that consumers must ensure
mount/unmount ordering (or avoid other global fetch overrides) to prevent
unexpected behavior; reference the useEffect block, globalThis.fetch, and
addInstanceHeaderToFetch so future maintainers know why the override is safe
here and what to watch for.
src/features/file-browser/hooks/useFileTree.ts (1)

130-135: Consider making the retry delay configurable or documenting the rationale.

The 200ms retry delay for handling instance switch race conditions is reasonable, but consider adding a brief comment explaining why this specific delay was chosen, or extracting it as a named constant for clarity.

💡 Suggested improvement
+  const INSTANCE_SWITCH_RETRY_DELAY_MS = 200;
+
   // Initial load
   const loadRoot = useCallback(async () => {
     setLoading(true);
     setError(null);
     let children = await fetchChildren('');
     if (!children) {
-      // Instance switch can race with transport reconnect; retry once quickly.
-      await new Promise((resolve) => setTimeout(resolve, 200));
+      // Instance switch can race with transport reconnect; retry once after brief delay.
+      await new Promise((resolve) => setTimeout(resolve, INSTANCE_SWITCH_RETRY_DELAY_MS));
       children = await fetchChildren('');
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/features/file-browser/hooks/useFileTree.ts` around lines 130 - 135,
Extract the hard-coded 200ms sleep in useFileTree where fetchChildren('') is
retried into a named constant (e.g., CHILDREN_RETRY_DELAY_MS) or make it a
configurable parameter to the hook, and add a short comment explaining the
rationale (to handle instance-switch/transport-reconnect race) and why 200ms was
chosen; update the retry code that currently awaits setTimeout to use that
constant/parameter and ensure any tests or callers are adjusted if you make it
configurable.
src/features/sessions/SessionList.tsx (1)

147-171: Consider using a custom dialog for instance removal confirmation.

The window.confirm approach works but is inconsistent with the custom Dialog component used for session deletion (lines 397-435). For UI consistency, consider using a similar custom dialog for instance removal.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/features/sessions/SessionList.tsx` around lines 147 - 171, The
removeInstance handler currently uses window.confirm which breaks UI
consistency; replace it by wiring a custom Dialog component similar to the
session deletion dialog: add component state (e.g., removeDialogOpen: boolean
and pendingRemoveInstance: DiscoveredInstance | null), change removeInstance to
only set pendingRemoveInstance and open the dialog (do not perform deletion),
render the Dialog with the same confirm/cancel flow as the session delete
dialog, and move the existing deletion logic (fetch, setInstanceMutationError,
setInstanceMutatingId, onSelectInstance clearing when activeInstanceId matches,
await onRefreshInstances, error handling and finally clearing
setInstanceMutatingId) into the dialog confirm callback; ensure you update
useCallback dependencies (activeInstanceId, onRefreshInstances,
onSelectInstance) and clear pendingRemoveInstance when canceling or after
completion.
src/lib/apiRouting.ts (2)

60-64: Fragment (#hash) in URL is silently dropped when instanceId is appended.

If pathWithQuery contains a fragment (e.g., /api/sessions?foo=bar#section), the fragment is passed to URLSearchParams which ignores it. The reconstructed URL loses the fragment entirely.

This may be intentional for API paths (fragments are typically client-side only), but worth confirming the behavior is expected.

🔧 Proposed fix to preserve fragments
 function splitPathAndQuery(input: string): { pathname: string; suffix: string } {
-  const idx = input.search(/[?#]/);
+  const idx = input.indexOf('?');
   if (idx === -1) return { pathname: input, suffix: '' };
   return { pathname: input.slice(0, idx), suffix: input.slice(idx) };
 }

Or handle fragments explicitly if needed:

 export function routeApiPath(pathWithQuery: string, instanceId: string | null): string {
   if (!instanceId) return pathWithQuery;

   const { pathname, suffix } = splitPathAndQuery(pathWithQuery);
   // ... prefix checks ...

-  const query = suffix.startsWith('?') ? suffix.slice(1) : suffix;
+  const hashIdx = suffix.indexOf('#');
+  const queryPart = hashIdx === -1 ? suffix : suffix.slice(0, hashIdx);
+  const fragmentPart = hashIdx === -1 ? '' : suffix.slice(hashIdx);
+  const query = queryPart.startsWith('?') ? queryPart.slice(1) : queryPart;
   const params = new URLSearchParams(query);
   params.set(INSTANCE_ID_QUERY_PARAM, instanceId);
   const nextQuery = params.toString();
-  return nextQuery ? `${pathname}?${nextQuery}` : pathname;
+  return nextQuery ? `${pathname}?${nextQuery}${fragmentPart}` : `${pathname}${fragmentPart}`;
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/apiRouting.ts` around lines 60 - 64, The current construction of the
URL drops any fragment because suffix is fed into URLSearchParams; update the
logic that builds the return URL (variables: suffix, query, params, nextQuery,
pathname and the constant INSTANCE_ID_QUERY_PARAM) to detect and strip a
fragment identifier from suffix before creating URLSearchParams, set the
instance id param on params, then reattach the original fragment (prefixed with
'#') to the final returned string (i.e., return
`${pathname}?${nextQuery}#${fragment}` when a fragment existed, otherwise keep
the existing return behavior). Ensure fragment extraction handles the case where
suffix starts with '?' as the existing code does.

98-103: Redundant MASTER_ONLY check after INSTANCE_ROUTED check.

The master-only check (lines 101-103) is unnecessary because master-only prefixes and instance-routed prefixes are disjoint sets. If a path doesn't match any instance-routed prefix (line 98), it will already return unchanged, making the master-only check unreachable for those paths.

However, this is defensive coding and doesn't affect correctness—just a minor clarity nit.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/apiRouting.ts` around lines 98 - 103, Remove the redundant
MASTER_ONLY_API_PREFIXES check that follows the INSTANCE_ROUTED_API_PREFIXES
check: since INSTANCE_ROUTED_API_PREFIXES and MASTER_ONLY_API_PREFIXES are
disjoint, the second if-block (MASTER_ONLY_API_PREFIXES.some((prefix) =>
matchesPrefix(pathname, prefix))) is unreachable after the early return and
should be deleted to improve clarity; keep the initial
INSTANCE_ROUTED_API_PREFIXES check (using matchesPrefix and pathname) and any
remaining logic that handles master-only routes in their appropriate place.
server/lib/docker-instances.ts (2)

966-970: Consider restricting file permissions for credential files.

The auth-profiles.json (line 968), .env (line 1029), and openclaw.json (line 1032) files contain sensitive credentials (API keys, tokens). These are created with default permissions (typically 0644, readable by all users on the system).

For better security hygiene, consider restricting permissions to owner-only.

🔒 Proposed fix to restrict permissions
+  // Restrict permissions for credential files
+  const credentialFileOpts = { encoding: 'utf8' as const, mode: 0o600 };
+
   if (filteredStore) {
     fs.writeFileSync(
       path.join(agentDir, 'auth-profiles.json'),
       `${JSON.stringify(filteredStore, null, 2)}\n`,
-      'utf8',
+      credentialFileOpts,
     );
   }
   // ...
   if (finalEnvEntries.size > 0) {
     const envContent = [...finalEnvEntries.entries()]
       .map(([key, value]) => `${key}=${value}`)
       .join('\n');
-    fs.writeFileSync(path.join(stateDir, '.env'), `${envContent}\n`, 'utf8');
+    fs.writeFileSync(path.join(stateDir, '.env'), `${envContent}\n`, credentialFileOpts);
   }

-  fs.writeFileSync(openclawConfigPath, `${JSON.stringify(cfg, null, 2)}\n`, 'utf8');
+  fs.writeFileSync(openclawConfigPath, `${JSON.stringify(cfg, null, 2)}\n`, credentialFileOpts);

Also applies to: 1029-1029, 1032-1032

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/lib/docker-instances.ts` around lines 966 - 970, The
auth-profiles.json, .env and openclaw.json files are written with default
permissive filesystem permissions; change the write calls that create these
files (the fs.writeFileSync calls that write auth-profiles.json, .env and
openclaw.json in docker-instances.ts) to set owner-only permissions (e.g. mode
0o600) by passing the mode option to fs.writeFileSync or by calling fs.chmodSync
immediately after write; ensure you apply the same fix to the three write sites
so credentials are only readable/writable by the file owner.

940-948: Minor modulo bias in token generation.

Using bytes[i] % alphabet.length with a 62-character alphabet introduces slight bias since 256 is not evenly divisible by 62. The first 8 characters are ~1.6% more likely than others.

For this use case (instance gateway tokens with 32 characters), the entropy is still sufficient, but this is worth noting.

♻️ Unbiased alternative using rejection sampling
 function generateInstanceGatewayToken(): string {
   const alphabet = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789';
-  const bytes = randomBytes(32);
   let token = '';
-  for (let i = 0; i < bytes.length; i += 1) {
-    token += alphabet[bytes[i] % alphabet.length];
+  while (token.length < 32) {
+    const bytes = randomBytes(32);
+    for (const byte of bytes) {
+      // Reject values >= 248 to avoid modulo bias (248 = 62 * 4)
+      if (byte < 248) {
+        token += alphabet[byte % 62];
+        if (token.length === 32) break;
+      }
+    }
   }
   return token;
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/lib/docker-instances.ts` around lines 940 - 948, The current
generateInstanceGatewayToken function uses bytes[i] % alphabet.length which
introduces a small modulo bias; change it to rejection sampling: compute a
maxValid = Math.floor(256 / alphabet.length) * alphabet.length and only accept a
random byte when byte < maxValid, mapping it to an alphabet index via byte %
alphabet.length; continue drawing random bytes (e.g., by calling randomBytes in
chunks) until you have 32 accepted characters, then return the token. Ensure you
reference generateInstanceGatewayToken, alphabet, and randomBytes when updating
the implementation.
server/lib/instance-proxy.ts (1)

107-121: Redundant content-encoding check.

Line 116 checks lower !== 'content-encoding', but content-encoding is already excluded from PROXY_RESPONSE_HEADERS (it's intentionally omitted per the comment on lines 18-19), so line 112 would already skip it.

This is harmless but slightly redundant.

♻️ Simplified version
 function buildResponseHeaders(upstream: Headers): Headers {
   const out = new Headers();
   const upstreamHasContentEncoding = upstream.has('content-encoding');
   for (const [key, value] of upstream.entries()) {
     const lower = key.toLowerCase();
     if (!PROXY_RESPONSE_HEADERS.has(lower)) continue;
     // Node fetch may decode upstream payload while still exposing encoded length.
     // Do not forward stale content-length if upstream declared content-encoding.
     if (lower === 'content-length' && upstreamHasContentEncoding) continue;
-    if (lower !== 'content-encoding') {
-      out.set(key, value);
-    }
+    out.set(key, value);
   }
   return out;
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/lib/instance-proxy.ts` around lines 107 - 121, The
buildResponseHeaders function contains a redundant check for 'content-encoding'
after PROXY_RESPONSE_HEADERS already excludes it; update buildResponseHeaders
(function name: buildResponseHeaders, variable: upstreamHasContentEncoding,
constant: PROXY_RESPONSE_HEADERS) to remove the unnecessary "if (lower !==
'content-encoding')" branch and simply set the header when the key passes the
PROXY_RESPONSE_HEADERS filter (keeping the existing special-case that skips
'content-length' when upstreamHasContentEncoding is true).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/API.md`:
- Around line 417-420: The documentation currently describes target selection
for the "Master-side proxy scaffold" and "Target resolution" as preferring
container 18789/tcp (the gateway WS port), which is incorrect for proxied
instance HTTP routes like /api/version and /api/files; update the text to state
that the proxy targets the instance's Nerve HTTP port (the instance API port)
instead of 18789/tcp and clarify that port selection prefers the Nerve HTTP port
and otherwise falls back to the first published TCP port so readers are routed
to the correct Nerve API endpoints.

In `@docs/research-aramaki-multiclaw.md`:
- Around line 80-89: Remove the already-implemented items 1 and 4 from the "Next
recommended hardening (not yet implemented)" list in
docs/research-aramaki-multiclaw.md: delete the bullet about generating/writing
"~/.openclaw/openclaw.json" and the bullet about "Update proxy target logic to
differentiate Nerve HTTP port vs Gateway WS port", since those are implemented
elsewhere in this PR; keep the remaining items 2 and 3 as-is and optionally add
a brief note pointing readers to the sections where the openclaw.json bootstrap
and proxy/port changes were implemented if clarification is helpful.

In `@PROGRESS.md`:
- Around line 5-8: Update the "Status at a glance" snapshot at the top of
PROGRESS.md so it reflects the current state (Milestones 1–5 completed and E2E
validation done) rather than only "Milestone 1 + Milestone 2 foundation
completed"; modify the "Overall status:" line under the "Status at a glance"
heading (and keep the "Branch: `multi-claw`" line intact) to a concise, accurate
summary like "Milestones 1–5 completed; E2E validation passed."

In `@scripts/start-multiclaw.sh`:
- Around line 20-23: The token generation currently uses command substitution
inside an export which masks failures (ShellCheck SC2155); instead, run the
generator command (tr/head) into a local variable (e.g., new_token), check its
exit status and length, then assign/export OPENCLAW_GATEWAY_TOKEN and
GATEWAY_TOKEN from that variable; update the block that sets
OPENCLAW_GATEWAY_TOKEN and GATEWAY_TOKEN so failures from the tr|head pipeline
are detected and handled before exporting.

In `@server/lib/docker-instances.test.ts`:
- Around line 106-127: The test's execFileImpl/dockerArgs assertion is too
permissive (uses expect.arrayContaining) and omits the '-e' flag before the
OPENAI_API_KEY value; update the assertion in the execFileImpl handler to assert
the exact docker run argument structure (either replace expect.arrayContaining
with a toEqual check of the full expected args array including '-e',
'OPENAI_API_KEY=openai-test-key', and other '-e' entries, or add an explicit
check that for each env var string in dockerArgs there is a preceding '-e'
token). Locate the check around the dockerArgs variable in execFileImpl and make
the assertion verify the pairing/order of '-e' flags and their values rather
than only presence.

In `@server/routes/health.ts`:
- Around line 29-39: The /healthcheck route handler currently always returns a
2xx JSON response via c.json even when probeGateway() reports gateway !== 'ok';
update the handler (the app.get('/healthcheck' route using probeGateway(),
gateway and ready variables) to return a non-2xx status when readiness is false
(e.g., use c.status(503).json(...) or equivalent) so orchestration systems see
an unhealthy pod; keep the existing JSON body (service, status, ready, uptime,
gateway) but send it with the non-OK HTTP status when ready is false.

In `@src/contexts/InstanceContext.tsx`:
- Around line 44-61: The refreshInstances useCallback is missing
setActiveInstanceId in its dependency array; update the dependency list for the
refreshInstances callback to include setActiveInstanceId (alongside any existing
deps) so the hook explicitly depends on the setter used inside (references:
refreshInstances, setActiveInstanceId, activeInstanceIdRef, setInstances,
setInstancesLoading).

---

Nitpick comments:
In `@Dockerfile.multiclaw`:
- Around line 21-24: The Dockerfile currently installs openclaw without a
version which leads to non-reproducible builds; update the RUN that calls npm
install -g openclaw to pin a specific version (e.g., npm install -g
openclaw@<version>) or add a build ARG like OPENCLAW_VERSION with a sensible
default and use npm install -g openclaw@$OPENCLAW_VERSION so builds are
deterministic; modify the RUN line containing npm install -g openclaw to
reference that pinned version/ARG and document/update any related build
instructions.

In `@PROGRESS.md`:
- Around line 64-74: Replace duplicate "## Validation notes" headers with unique
headings per section (for example "## Validation notes — Milestone 1", "##
Validation notes — Milestone 2", etc.) so each anchor is unique and MD024 is
avoided; update the three occurrences that currently use the same "## Validation
notes" heading (the block shown in the diff and the other two similar sections)
and keep the existing bullet content unchanged.

In `@server/lib/docker-instances.ts`:
- Around line 966-970: The auth-profiles.json, .env and openclaw.json files are
written with default permissive filesystem permissions; change the write calls
that create these files (the fs.writeFileSync calls that write
auth-profiles.json, .env and openclaw.json in docker-instances.ts) to set
owner-only permissions (e.g. mode 0o600) by passing the mode option to
fs.writeFileSync or by calling fs.chmodSync immediately after write; ensure you
apply the same fix to the three write sites so credentials are only
readable/writable by the file owner.
- Around line 940-948: The current generateInstanceGatewayToken function uses
bytes[i] % alphabet.length which introduces a small modulo bias; change it to
rejection sampling: compute a maxValid = Math.floor(256 / alphabet.length) *
alphabet.length and only accept a random byte when byte < maxValid, mapping it
to an alphabet index via byte % alphabet.length; continue drawing random bytes
(e.g., by calling randomBytes in chunks) until you have 32 accepted characters,
then return the token. Ensure you reference generateInstanceGatewayToken,
alphabet, and randomBytes when updating the implementation.

In `@server/lib/instance-proxy.ts`:
- Around line 107-121: The buildResponseHeaders function contains a redundant
check for 'content-encoding' after PROXY_RESPONSE_HEADERS already excludes it;
update buildResponseHeaders (function name: buildResponseHeaders, variable:
upstreamHasContentEncoding, constant: PROXY_RESPONSE_HEADERS) to remove the
unnecessary "if (lower !== 'content-encoding')" branch and simply set the header
when the key passes the PROXY_RESPONSE_HEADERS filter (keeping the existing
special-case that skips 'content-length' when upstreamHasContentEncoding is
true).

In `@server/lib/instance-routing.ts`:
- Around line 1-4: The INSTANCE_ID_RE currently allows 1–128 characters which
exceeds the Docker/container name limit used elsewhere (e.g.,
CreateInstanceDialog); update INSTANCE_ID_RE to enforce the same max length as
the UI by changing the quantifier so the total length is 1–63 characters (adjust
the `{0,127}` to `{0,62}`) or, if 128 is intentional, add a clear comment
explaining the deliberate mismatch and reference CreateInstanceDialog; ensure
the change or comment is made at the INSTANCE_ID_RE declaration so all
validations remain consistent.

In `@src/contexts/GatewayContext.tsx`:
- Around line 146-163: Replace the IIFE credential retrieval with a clearer
sequence: read lastConnectRef.current into a variable (e.g., const last =
lastConnectRef.current), if last exists use it as saved, otherwise call
loadConfig() into a variable (e.g., const loaded = loadConfig()) and derive
saved from loaded.url/loaded.token; keep the rest of the effect logic using
previousInstanceIdRef, connectionState, disconnect, and connect unchanged so
reconnect uses the same saved {url, token} structure.

In `@src/contexts/InstanceContext.tsx`:
- Around line 71-86: The current useEffect replaces globalThis.fetch on mount
(and restores it on unmount) which can conflict if other modules also override
fetch; add a concise comment above this useEffect explaining the coordination
constraint: that originalFetch is captured on mount, restoration happens on
unmount, and that consumers must ensure mount/unmount ordering (or avoid other
global fetch overrides) to prevent unexpected behavior; reference the useEffect
block, globalThis.fetch, and addInstanceHeaderToFetch so future maintainers know
why the override is safe here and what to watch for.

In `@src/features/file-browser/FileTreePanel.tsx`:
- Around line 576-578: The debug status line (the div rendering debug.lastPath,
debug.instanceId, debug.ok/status/error) should be hidden in production; wrap
that debug rendering in a conditional check (e.g., only render when a debug flag
is true or when NODE_ENV !== 'production' or when a prop like showDebug/isDraft
is set) so end users don't see the inst=... status; locate the div that
references debug.lastPath, debug.instanceId and debug.ok and gate its rendering
behind the chosen environment/feature flag.

In `@src/features/file-browser/hooks/useFileTree.ts`:
- Around line 130-135: Extract the hard-coded 200ms sleep in useFileTree where
fetchChildren('') is retried into a named constant (e.g.,
CHILDREN_RETRY_DELAY_MS) or make it a configurable parameter to the hook, and
add a short comment explaining the rationale (to handle
instance-switch/transport-reconnect race) and why 200ms was chosen; update the
retry code that currently awaits setTimeout to use that constant/parameter and
ensure any tests or callers are adjusted if you make it configurable.

In `@src/features/sessions/CreateInstanceDialog.tsx`:
- Around line 239-244: The validation message is shown immediately on open;
modify CreateInstanceDialog to only show validationMessage after the user has
interacted or tried to submit: add a local boolean state (e.g., touched or
attemptedSubmit) and set it to true on the name input's onBlur (and also set
attemptedSubmit true in the submit handler), then change the rendering condition
from "validationMessage && !error" to "validationMessage && !error && (touched
|| attemptedSubmit)". Update the name input component to call the onBlur handler
that flips touched true and ensure the submit path sets attemptedSubmit so
validation appears after submit.

In `@src/features/sessions/SessionList.test.tsx`:
- Around line 70-86: The test asserts the stop button is disabled but then
attempts a userEvent.click which only verifies browser behavior, so either
remove the click and associated fetchMock assertion or strengthen the test to
verify component logic: update the test in SessionList.test.tsx to keep the
renderList call and the expect(stopButton).toBeDisabled() assertion, then either
(A) delete the userEvent.click(stopButton) and the
expect(fetchMock).not.toHaveBeenCalled() lines, or (B) instead of simulating a
DOM click, directly call the component's stop handler (or trigger the same
action via the instance method used by the Stop button) and assert fetchMock was
not invoked; reference the stopButton variable, renderList helper, and fetchMock
when making the change.

In `@src/features/sessions/SessionList.tsx`:
- Around line 147-171: The removeInstance handler currently uses window.confirm
which breaks UI consistency; replace it by wiring a custom Dialog component
similar to the session deletion dialog: add component state (e.g.,
removeDialogOpen: boolean and pendingRemoveInstance: DiscoveredInstance | null),
change removeInstance to only set pendingRemoveInstance and open the dialog (do
not perform deletion), render the Dialog with the same confirm/cancel flow as
the session delete dialog, and move the existing deletion logic (fetch,
setInstanceMutationError, setInstanceMutatingId, onSelectInstance clearing when
activeInstanceId matches, await onRefreshInstances, error handling and finally
clearing setInstanceMutatingId) into the dialog confirm callback; ensure you
update useCallback dependencies (activeInstanceId, onRefreshInstances,
onSelectInstance) and clear pendingRemoveInstance when canceling or after
completion.

In `@src/hooks/useDashboardData.ts`:
- Around line 140-158: When activeInstanceId changes the effect restarts but
previous memories/tokenData remain until fetch completes, causing a stale flash;
update the start of the useEffect (the block that creates controller and calls
refreshMemories/refreshTokens) to clear the previous instance state: call
setMemories([]) (or appropriate empty shape) and reset tokenData (e.g.,
setTokenData(null) or []) and setTokensLoading(true) alongside
setMemoriesLoading(true) before invoking refreshMemories/refreshTokens so the UI
shows cleared loading state for the new instance; keep the existing abort
controller/interval logic unchanged and ensure you reference useEffect,
refreshMemories, refreshTokens, setMemoriesLoading, setMemories,
tokenData/setTokenData, and activeInstanceId when making the change.

In `@src/lib/apiRouting.ts`:
- Around line 60-64: The current construction of the URL drops any fragment
because suffix is fed into URLSearchParams; update the logic that builds the
return URL (variables: suffix, query, params, nextQuery, pathname and the
constant INSTANCE_ID_QUERY_PARAM) to detect and strip a fragment identifier from
suffix before creating URLSearchParams, set the instance id param on params,
then reattach the original fragment (prefixed with '#') to the final returned
string (i.e., return `${pathname}?${nextQuery}#${fragment}` when a fragment
existed, otherwise keep the existing return behavior). Ensure fragment
extraction handles the case where suffix starts with '?' as the existing code
does.
- Around line 98-103: Remove the redundant MASTER_ONLY_API_PREFIXES check that
follows the INSTANCE_ROUTED_API_PREFIXES check: since
INSTANCE_ROUTED_API_PREFIXES and MASTER_ONLY_API_PREFIXES are disjoint, the
second if-block (MASTER_ONLY_API_PREFIXES.some((prefix) =>
matchesPrefix(pathname, prefix))) is unreachable after the early return and
should be deleted to improve clarity; keep the initial
INSTANCE_ROUTED_API_PREFIXES check (using matchesPrefix and pathname) and any
remaining logic that handles master-only routes in their appropriate place.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: fff8d92b-ca21-450b-bcb2-8143363c7276

📥 Commits

Reviewing files that changed from the base of the PR and between 6767339 and c2a2886.

📒 Files selected for processing (36)
  • .dockerignore
  • Dockerfile.multiclaw
  • GOALS.md
  • PROGRESS.md
  • README.md
  • docs/API.md
  • docs/research-aramaki-multiclaw.md
  • scripts/start-multiclaw.sh
  • server/app.ts
  • server/lib/docker-instances.test.ts
  • server/lib/docker-instances.ts
  • server/lib/instance-proxy.ts
  • server/lib/instance-routing.ts
  • server/lib/ws-proxy.test.ts
  • server/lib/ws-proxy.ts
  • server/middleware/instance-routing.test.ts
  • server/middleware/instance-routing.ts
  • server/routes/health.test.ts
  • server/routes/health.ts
  • server/routes/instances.test.ts
  • server/routes/instances.ts
  • src/App.tsx
  • src/contexts/GatewayContext.tsx
  • src/contexts/InstanceContext.tsx
  • src/features/auth/AuthGate.tsx
  • src/features/file-browser/FileTreePanel.tsx
  • src/features/file-browser/hooks/useFileTree.ts
  • src/features/sessions/CreateInstanceDialog.tsx
  • src/features/sessions/SessionList.test.tsx
  • src/features/sessions/SessionList.tsx
  • src/hooks/useDashboardData.ts
  • src/hooks/useServerEvents.ts
  • src/hooks/useWebSocket.test.ts
  • src/hooks/useWebSocket.ts
  • src/lib/apiRouting.test.ts
  • src/lib/apiRouting.ts

Comment on lines +417 to +420
Master-side proxy scaffold for forwarding API calls to a selected local Docker instance.

- Target resolution uses local Docker inspection and forwards to `http://127.0.0.1:<published-port>`.
- Port selection prefers container `18789/tcp`; if unavailable, falls back to the first published TCP port.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Document the proxy against the Nerve HTTP port, not the gateway port.

This endpoint proxies instance HTTP routes like /api/version and /api/files*. Saying it prefers 18789/tcp describes the gateway WS target instead of the instance’s Nerve API port, which will send readers to the wrong place.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/API.md` around lines 417 - 420, The documentation currently describes
target selection for the "Master-side proxy scaffold" and "Target resolution" as
preferring container 18789/tcp (the gateway WS port), which is incorrect for
proxied instance HTTP routes like /api/version and /api/files; update the text
to state that the proxy targets the instance's Nerve HTTP port (the instance API
port) instead of 18789/tcp and clarify that port selection prefers the Nerve
HTTP port and otherwise falls back to the first published TCP port so readers
are routed to the correct Nerve API endpoints.

Comment on lines +80 to +89
## Next recommended hardening (not yet implemented)
1. Optionally generate/write minimal `~/.openclaw/openclaw.json` in container (like Aramaki) so behavior is predictable across OpenClaw CLI versions.
2. Add explicit readiness checks in startup script:
- wait for gateway port to accept connections
- fail fast if not ready in timeout window
3. Add non-default secure profile example for production:
- `NERVE_AUTH=true`
- `NERVE_SESSION_SECRET`
- explicit auth password env.
4. Update proxy target logic to differentiate Nerve HTTP port vs Gateway WS port if needed by frontend routing model.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove already-landed items from the “not yet implemented” list.

Item 1 (~/.openclaw/openclaw.json bootstrap) and item 4 (separate HTTP/WS port handling) are described elsewhere in this PR as implemented. Leaving them here as future work makes the research note misleading.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/research-aramaki-multiclaw.md` around lines 80 - 89, Remove the
already-implemented items 1 and 4 from the "Next recommended hardening (not yet
implemented)" list in docs/research-aramaki-multiclaw.md: delete the bullet
about generating/writing "~/.openclaw/openclaw.json" and the bullet about
"Update proxy target logic to differentiate Nerve HTTP port vs Gateway WS port",
since those are implemented elsewhere in this PR; keep the remaining items 2 and
3 as-is and optionally add a brief note pointing readers to the sections where
the openclaw.json bootstrap and proxy/port changes were implemented if
clarification is helpful.

Comment on lines +5 to +8
## Status at a glance
- Branch: `multi-claw`
- Overall status: **Milestone 1 + Milestone 2 foundation completed**

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Refresh the top-level status snapshot.

It still says only Milestones 1–2 are complete, but the rest of the file documents Milestones 3–5 and E2E validation. That makes the summary stale on first read.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@PROGRESS.md` around lines 5 - 8, Update the "Status at a glance" snapshot at
the top of PROGRESS.md so it reflects the current state (Milestones 1–5
completed and E2E validation done) rather than only "Milestone 1 + Milestone 2
foundation completed"; modify the "Overall status:" line under the "Status at a
glance" heading (and keep the "Branch: `multi-claw`" line intact) to a concise,
accurate summary like "Milestones 1–5 completed; E2E validation passed."

Comment on lines +20 to +23
if [[ -z "${OPENCLAW_GATEWAY_TOKEN:-}" ]]; then
export OPENCLAW_GATEWAY_TOKEN="$(tr -dc 'a-zA-Z0-9' </dev/urandom | head -c 32)"
export GATEWAY_TOKEN="${OPENCLAW_GATEWAY_TOKEN}"
fi
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Declare and assign separately to avoid masking return values.

Shellcheck SC2155 flags line 21. If tr or head fails, the exit code is masked by the export assignment, and the script continues with an empty or partial token.

🛡️ Proposed fix
 if [[ -z "${OPENCLAW_GATEWAY_TOKEN:-}" ]]; then
-  export OPENCLAW_GATEWAY_TOKEN="$(tr -dc 'a-zA-Z0-9' </dev/urandom | head -c 32)"
+  OPENCLAW_GATEWAY_TOKEN="$(tr -dc 'a-zA-Z0-9' </dev/urandom | head -c 32)"
+  export OPENCLAW_GATEWAY_TOKEN
   export GATEWAY_TOKEN="${OPENCLAW_GATEWAY_TOKEN}"
 fi
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if [[ -z "${OPENCLAW_GATEWAY_TOKEN:-}" ]]; then
export OPENCLAW_GATEWAY_TOKEN="$(tr -dc 'a-zA-Z0-9' </dev/urandom | head -c 32)"
export GATEWAY_TOKEN="${OPENCLAW_GATEWAY_TOKEN}"
fi
if [[ -z "${OPENCLAW_GATEWAY_TOKEN:-}" ]]; then
OPENCLAW_GATEWAY_TOKEN="$(tr -dc 'a-zA-Z0-9' </dev/urandom | head -c 32)"
export OPENCLAW_GATEWAY_TOKEN
export GATEWAY_TOKEN="${OPENCLAW_GATEWAY_TOKEN}"
fi
🧰 Tools
🪛 Shellcheck (0.11.0)

[warning] 21-21: Declare and assign separately to avoid masking return values.

(SC2155)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/start-multiclaw.sh` around lines 20 - 23, The token generation
currently uses command substitution inside an export which masks failures
(ShellCheck SC2155); instead, run the generator command (tr/head) into a local
variable (e.g., new_token), check its exit status and length, then assign/export
OPENCLAW_GATEWAY_TOKEN and GATEWAY_TOKEN from that variable; update the block
that sets OPENCLAW_GATEWAY_TOKEN and GATEWAY_TOKEN so failures from the tr|head
pipeline are detected and handled before exporting.

Comment on lines +106 to +127
execFileImpl = (_bin: unknown, args: unknown, _opts: unknown, cb: unknown) => {
const dockerArgs = args as string[];
if (dockerArgs[0] === 'run') {
expect(dockerArgs).toEqual(
expect.arrayContaining([
'-d',
'--name',
'worker-a',
'-p',
'0:3080',
'-p',
'0:3181',
'-e',
'NERVE_ALLOW_INSECURE=true',
'OPENAI_API_KEY=openai-test-key',
'multiclaw:test',
]),
);
expect(dockerArgs).toContain('-v');
(cb as (err: null, stdout: string, stderr: string) => void)(null, 'cid-worker-a\n', '');
return;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify the actual docker run argument structure in createDockerInstance
ast-grep --pattern 'execFile($_, [$$$, "run", $$$], $_)'

Repository: daggerhashimoto/openclaw-nerve

Length of output: 56


🏁 Script executed:

# First, let's read the test file around the mentioned lines
cat -n server/lib/docker-instances.test.ts | sed -n '90,140p'

Repository: daggerhashimoto/openclaw-nerve

Length of output: 2132


🏁 Script executed:

# Let's also find the implementation file being tested
fd "docker-instances" --type f | grep -v node_modules

Repository: daggerhashimoto/openclaw-nerve

Length of output: 140


🏁 Script executed:

# Search for where OPENAI_API_KEY is used in the docker arguments
rg "OPENAI_API_KEY" server/lib/ -A 2 -B 2

Repository: daggerhashimoto/openclaw-nerve

Length of output: 7861


🏁 Script executed:

# Read the docker-instances.ts implementation file to understand how args are built
cat -n server/lib/docker-instances.ts | head -100

Repository: daggerhashimoto/openclaw-nerve

Length of output: 3709


🏁 Script executed:

# Search for where the docker run command is built with environment variables
rg "run.*-e" server/lib/docker-instances.ts -A 5 -B 5

Repository: daggerhashimoto/openclaw-nerve

Length of output: 1184


🏁 Script executed:

# Search for the function that builds the docker args
rg "createDockerInstance\|execFile" server/lib/docker-instances.ts -A 10 -B 2 | head -100

Repository: daggerhashimoto/openclaw-nerve

Length of output: 56


🏁 Script executed:

# Let me verify the exact line where env vars are added to confirm the full context
cat -n server/lib/docker-instances.ts | sed -n '280,295p'

Repository: daggerhashimoto/openclaw-nerve

Length of output: 692


🏁 Script executed:

# Find the exact location where docker args are built with env variables
rg "args.push\('-e'" server/lib/docker-instances.ts -B 10 -A 5

Repository: daggerhashimoto/openclaw-nerve

Length of output: 1065


Test assertion for docker run args is incomplete and masks potential bugs.

The assertion uses expect.arrayContaining() which only checks if expected items are present in the array, without verifying their order or proper pairing structure. Line 120 is missing the preceding -e flag that the actual implementation generates. According to the code, each environment variable should be passed with a -e flag (e.g., '-e', 'OPENAI_API_KEY=openai-test-key'), but the test only expects 'OPENAI_API_KEY=openai-test-key' without it. This means the test could pass even if the implementation incorrectly omits the -e flags. Consider using toEqual() with exact expected arguments or verify the -e KEY=value pairing structure more explicitly.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/lib/docker-instances.test.ts` around lines 106 - 127, The test's
execFileImpl/dockerArgs assertion is too permissive (uses
expect.arrayContaining) and omits the '-e' flag before the OPENAI_API_KEY value;
update the assertion in the execFileImpl handler to assert the exact docker run
argument structure (either replace expect.arrayContaining with a toEqual check
of the full expected args array including '-e',
'OPENAI_API_KEY=openai-test-key', and other '-e' entries, or add an explicit
check that for each env var string in dockerArgs there is a preceding '-e'
token). Locate the check around the dockerArgs variable in execFileImpl and make
the assertion verify the pairing/order of '-e' flags and their values rather
than only presence.

Comment on lines +29 to +39
app.get('/healthcheck', async (c) => {
const gateway = await probeGateway();
const ready = gateway === 'ok';
return c.json({
service: 'nerve',
status: ready ? 'ready' : 'initializing',
ready,
uptime: process.uptime(),
gateway,
});
});
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Return a non-2xx response when readiness is false.

Orchestrators usually key off the HTTP status code, not the JSON body. As written, /healthcheck still looks healthy when gateway === 'unreachable', so readiness probes will keep routing traffic to an unready instance.

Suggested fix
 app.get('/healthcheck', async (c) => {
   const gateway = await probeGateway();
   const ready = gateway === 'ok';
-  return c.json({
+  return c.json({
     service: 'nerve',
     status: ready ? 'ready' : 'initializing',
     ready,
     uptime: process.uptime(),
     gateway,
-  });
+  }, ready ? 200 : 503);
 });
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
app.get('/healthcheck', async (c) => {
const gateway = await probeGateway();
const ready = gateway === 'ok';
return c.json({
service: 'nerve',
status: ready ? 'ready' : 'initializing',
ready,
uptime: process.uptime(),
gateway,
});
});
app.get('/healthcheck', async (c) => {
const gateway = await probeGateway();
const ready = gateway === 'ok';
return c.json({
service: 'nerve',
status: ready ? 'ready' : 'initializing',
ready,
uptime: process.uptime(),
gateway,
}, ready ? 200 : 503);
});
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/routes/health.ts` around lines 29 - 39, The /healthcheck route handler
currently always returns a 2xx JSON response via c.json even when probeGateway()
reports gateway !== 'ok'; update the handler (the app.get('/healthcheck' route
using probeGateway(), gateway and ready variables) to return a non-2xx status
when readiness is false (e.g., use c.status(503).json(...) or equivalent) so
orchestration systems see an unhealthy pod; keep the existing JSON body
(service, status, ready, uptime, gateway) but send it with the non-OK HTTP
status when ready is false.

Comment on lines +44 to +61
const refreshInstances = useCallback(async () => {
try {
const resp = await fetch('/api/instances');
if (!resp.ok) return;
const payload = await resp.json() as { instances?: DiscoveredInstance[] };
const next = Array.isArray(payload.instances) ? payload.instances : [];
setInstances(next);

// Fall back to master if selected instance disappeared.
if (activeInstanceIdRef.current && !next.some((instance) => instance.id === activeInstanceIdRef.current)) {
setActiveInstanceId(null);
}
} catch {
// Keep stale values; do not fail UI flows on transient discovery issues.
} finally {
setInstancesLoading(false);
}
}, []);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Missing dependency in refreshInstances callback.

The refreshInstances callback uses setActiveInstanceId on line 54, but it's not included in the dependency array. While setActiveInstanceId has a stable identity (created with useCallback and empty deps), including it satisfies the exhaustive-deps rule and makes the dependency explicit.

🔧 Proposed fix
       // Keep stale values; do not fail UI flows on transient discovery issues.
     } finally {
       setInstancesLoading(false);
     }
-  }, []);
+  }, [setActiveInstanceId]);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/contexts/InstanceContext.tsx` around lines 44 - 61, The refreshInstances
useCallback is missing setActiveInstanceId in its dependency array; update the
dependency list for the refreshInstances callback to include setActiveInstanceId
(alongside any existing deps) so the hook explicitly depends on the setter used
inside (references: refreshInstances, setActiveInstanceId, activeInstanceIdRef,
setInstances, setInstancesLoading).

Codex added 2 commits March 14, 2026 19:59
# Conflicts:
#	server/lib/ws-proxy.ts
#	src/App.tsx
#	src/features/file-browser/FileTreePanel.tsx
#	src/features/file-browser/hooks/useFileTree.ts
#	src/features/sessions/SessionList.tsx
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (1)
docs/API.md (1)

429-430: ⚠️ Potential issue | 🟡 Minor

Proxy target port is still documented incorrectly for HTTP proxying.

At Line 430, describing 18789/tcp as the preferred proxy target points to the gateway port, not the instance HTTP API target used for proxied /api/* routes. This remains misleading for integrators.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/API.md` around lines 429 - 430, The docs incorrectly state that proxying
should prefer container port 18789/tcp (which is the gateway), causing confusion
for `/api/*` HTTP proxy routes; update the text to indicate that the proxy
target should be the instance HTTP API port (the container port serving the
instance API), and change the port-selection guidance to prefer the instance API
port (e.g., the container port used by the instance HTTP API) and only fall back
to other published TCP ports if that API port is unavailable; update the
sentence referencing `18789/tcp` and the `/api/*` proxy behavior accordingly so
integrators see the correct target for HTTP proxying.
🧹 Nitpick comments (2)
src/features/file-browser/FileTreePanel.tsx (1)

585-587: Debug status bar provides useful instance visibility.

The abbreviated instance ID and request status help diagnose routing issues. Consider making this debug bar conditional (e.g., dev mode only) before shipping to production, or adding a user-toggle.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/features/file-browser/FileTreePanel.tsx` around lines 585 - 587, The
debug status bar (the div rendering debug.lastPath/debug.instanceId/debug.ok in
FileTreePanel.tsx) should be rendered conditionally so it doesn’t appear in
production; update the component to only show this debug panel when a dev-mode
flag or a user-toggle prop is enabled (e.g., check process.env.NODE_ENV ===
'development' or a new prop like showDebug) and wire a simple toggle state or
prop to control visibility; keep the existing content (debug.lastPath,
debug.instanceId slice, debug.ok/status/error) but wrap it in a conditional
render that respects the environment/prop so it can be enabled during
development or by an explicit user toggle.
src/features/sessions/SessionList.tsx (1)

277-279: Consider also enabling stop for 'restarting' containers.

The current regex only matches running. Per typical Docker semantics, containers in restarting state can also be stopped. Consider:

-const running = /running/i.test(runtimeStatus);
+const running = /running|restarting/i.test(runtimeStatus);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/features/sessions/SessionList.tsx` around lines 277 - 279, The code
determines stoppable containers using the regex /running/i (runtimeStatus ->
running), but that misses containers in the "restarting" state; update the logic
in SessionList.tsx where runtimeStatus, status, and running are computed (the
variables runtimeStatus, status, running) so that running becomes true for both
"running" and "restarting" (e.g., replace /running/i with a regex matching both
states like /(running|restarting)/i or equivalent), ensuring the UI/actions that
use running will allow stopping restarting containers as well.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/API.md`:
- Around line 413-416: Docs and implementation disagree on the error code for
non-allowlisted keys: the API.md table uses "invalid_credential_key" while the
instances route returns "invalid_configuration_key"; update the API.md entry
(the table row for 400 errors) to use "invalid_configuration_key" and ensure any
examples/tests referencing "invalid_credential_key" are updated to
"invalid_configuration_key" so the documented contract matches the instances
route behavior.

In `@server/lib/ws-proxy.ts`:
- Around line 152-155: The client WebSocket close for missing instance token in
the getInstanceToken branch should use the same close code/semantics as other
instance resolution failures: update the clientWs.close call in the
tokenResult?.token check (the getInstanceToken handling) to use close code 1013
and the same reason text used elsewhere for "instance unavailable" so that
client retry/behavior matches the other failure paths.

In `@src/contexts/GatewayContext.tsx`:
- Around line 150-168: The effect in GatewayContext (using
previousInstanceIdRef, lastConnectRef, loadConfig, connectionState, disconnect,
connect) causes dual reconnections and double runs because connectionState is in
the dependency array and useWebSocket also reacts to activeInstanceId; remove
connectionState from the dependency array and instead read the current
connection state via a ref (e.g., connectionStateRef) inside the effect or let
useWebSocket own reconnection logic by removing the manual disconnect/connect
sequence here; update the effect to only respond to activeInstanceId changes,
consult lastConnectRef/loadConfig for credentials, and call either
disconnect+connect OR rely on useWebSocket’s reconnection path (not both) to
avoid racing reconnects.

In `@src/features/file-browser/hooks/useFileTree.ts`:
- Around line 98-102: fetchChildren is calling the hardcoded endpoint
`/api/files/tree` so the request always hits master; update fetchChildren to
build the routed path and headers using the routing helpers: call
routeApiPath('/api/files/tree', activeInstanceId) to get the correct URL and use
addInstanceHeaderToFetch(fetch, activeInstanceId) (or pass the returned headers)
when invoking fetch so requests obey INSTANCE_ROUTED_API_PREFIXES and follow the
currently selected instance; keep existing params (`?path=...&depth=1`) and
replace the direct requestPath and fetch call with these helpers inside the same
fetchChildren function.

---

Duplicate comments:
In `@docs/API.md`:
- Around line 429-430: The docs incorrectly state that proxying should prefer
container port 18789/tcp (which is the gateway), causing confusion for `/api/*`
HTTP proxy routes; update the text to indicate that the proxy target should be
the instance HTTP API port (the container port serving the instance API), and
change the port-selection guidance to prefer the instance API port (e.g., the
container port used by the instance HTTP API) and only fall back to other
published TCP ports if that API port is unavailable; update the sentence
referencing `18789/tcp` and the `/api/*` proxy behavior accordingly so
integrators see the correct target for HTTP proxying.

---

Nitpick comments:
In `@src/features/file-browser/FileTreePanel.tsx`:
- Around line 585-587: The debug status bar (the div rendering
debug.lastPath/debug.instanceId/debug.ok in FileTreePanel.tsx) should be
rendered conditionally so it doesn’t appear in production; update the component
to only show this debug panel when a dev-mode flag or a user-toggle prop is
enabled (e.g., check process.env.NODE_ENV === 'development' or a new prop like
showDebug) and wire a simple toggle state or prop to control visibility; keep
the existing content (debug.lastPath, debug.instanceId slice,
debug.ok/status/error) but wrap it in a conditional render that respects the
environment/prop so it can be enabled during development or by an explicit user
toggle.

In `@src/features/sessions/SessionList.tsx`:
- Around line 277-279: The code determines stoppable containers using the regex
/running/i (runtimeStatus -> running), but that misses containers in the
"restarting" state; update the logic in SessionList.tsx where runtimeStatus,
status, and running are computed (the variables runtimeStatus, status, running)
so that running becomes true for both "running" and "restarting" (e.g., replace
/running/i with a regex matching both states like /(running|restarting)/i or
equivalent), ensuring the UI/actions that use running will allow stopping
restarting containers as well.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 90d95b44-9fcc-4f0b-b3b2-103c82ef815b

📥 Commits

Reviewing files that changed from the base of the PR and between c2a2886 and 6226b88.

📒 Files selected for processing (11)
  • README.md
  • docs/API.md
  • server/app.ts
  • server/lib/ws-proxy.test.ts
  • server/lib/ws-proxy.ts
  • src/App.tsx
  • src/contexts/GatewayContext.tsx
  • src/features/file-browser/FileTreePanel.tsx
  • src/features/file-browser/hooks/useFileTree.ts
  • src/features/sessions/SessionList.tsx
  • src/hooks/useWebSocket.ts
🚧 Files skipped from review as they are similar to previous changes (2)
  • server/lib/ws-proxy.test.ts
  • README.md

Comment on lines +413 to +416
| 400 | `{ "error": { "code": "invalid_json", ... } }` | Body is not valid JSON |
| 400 | `{ "error": { "code": "invalid_request", ... } }` | Validation failed (name/type/keys) |
| 400 | `{ "error": { "code": "invalid_credential_key", ... } }` | Credential key not in allowlist |
| 409 | `{ "error": { "code": "instance_name_in_use", ... } }` | Container name already exists |
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Documented error code for invalid create keys is inconsistent with the route.

At Line 415, docs say invalid_credential_key, but the implementation returns invalid_configuration_key in server/routes/instances.ts for non-allowlisted keys. Please align the contract to avoid client-side error handling bugs.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/API.md` around lines 413 - 416, Docs and implementation disagree on the
error code for non-allowlisted keys: the API.md table uses
"invalid_credential_key" while the instances route returns
"invalid_configuration_key"; update the API.md entry (the table row for 400
errors) to use "invalid_configuration_key" and ensure any examples/tests
referencing "invalid_credential_key" are updated to "invalid_configuration_key"
so the documented contract matches the instances route behavior.

Comment on lines +152 to +155
const tokenResult = await getInstanceToken(requestedInstanceId);
if (!tokenResult?.token) {
clientWs.close(1011, 'Instance token missing');
return;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use consistent “instance unavailable” close semantics for missing token.

Line 154 returns 1011 while other instance resolution failures use 1013 (Lines 143 and 148). Unifying this makes client behavior/retry handling consistent.

Suggested patch
-        if (!tokenResult?.token) {
-          clientWs.close(1011, 'Instance token missing');
+        if (!tokenResult?.token) {
+          clientWs.close(1013, 'Instance token unavailable');
           return;
         }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const tokenResult = await getInstanceToken(requestedInstanceId);
if (!tokenResult?.token) {
clientWs.close(1011, 'Instance token missing');
return;
const tokenResult = await getInstanceToken(requestedInstanceId);
if (!tokenResult?.token) {
clientWs.close(1013, 'Instance token unavailable');
return;
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/lib/ws-proxy.ts` around lines 152 - 155, The client WebSocket close
for missing instance token in the getInstanceToken branch should use the same
close code/semantics as other instance resolution failures: update the
clientWs.close call in the tokenResult?.token check (the getInstanceToken
handling) to use close code 1013 and the same reason text used elsewhere for
"instance unavailable" so that client retry/behavior matches the other failure
paths.

Comment on lines +150 to +168
// Keep the gateway stream aligned to the selected instance.
useEffect(() => {
const prev = previousInstanceIdRef.current;
if (prev === activeInstanceId) return;
previousInstanceIdRef.current = activeInstanceId;

const saved = lastConnectRef.current || (() => {
const loaded = loadConfig();
if (loaded.url && loaded.token) return { url: loaded.url as string, token: loaded.token as string };
return null;
})();
if (!saved) return;

if (connectionState === 'connected' || connectionState === 'connecting' || connectionState === 'reconnecting') {
disconnect();
// Use public connect wrapper so reconnect credentials/flags are reset consistently.
void connect(saved.url, saved.token).catch(() => {});
}
}, [activeInstanceId, connectionState, disconnect, connect]);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Potential dual reconnection and effect re-triggering on instance switch.

Two concerns with this effect:

  1. Dual reconnection sequences: When activeInstanceId changes, this effect disconnects and reconnects. However, useWebSocket (line 51) also receives activeInstanceId and rebuilds its proxy URL independently (see useWebSocket.ts lines 142-146). This creates two independent reconnection paths that may race or conflict.

  2. connectionState in dependency array: After disconnect() on line 164, connectionState transitions to 'disconnected'. Since connectionState is a dependency, this triggers the effect again. The prev === activeInstanceId guard (line 153) prevents an infinite loop, but the effect still runs twice per instance switch — once for the activeInstanceId change and once for the connectionState change.

Consider removing connectionState from the dependency array and capturing the current state in a ref, or coordinating reconnection solely through useWebSocket to avoid dual paths.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/contexts/GatewayContext.tsx` around lines 150 - 168, The effect in
GatewayContext (using previousInstanceIdRef, lastConnectRef, loadConfig,
connectionState, disconnect, connect) causes dual reconnections and double runs
because connectionState is in the dependency array and useWebSocket also reacts
to activeInstanceId; remove connectionState from the dependency array and
instead read the current connection state via a ref (e.g., connectionStateRef)
inside the effect or let useWebSocket own reconnection logic by removing the
manual disconnect/connect sequence here; update the effect to only respond to
activeInstanceId changes, consult lastConnectRef/loadConfig for credentials, and
call either disconnect+connect OR rely on useWebSocket’s reconnection path (not
both) to avoid racing reconnects.

Comment on lines 98 to +102
const fetchChildren = useCallback(async (dirPath: string): Promise<TreeEntry[] | null> => {
try {
const params = dirPath ? `?path=${encodeURIComponent(dirPath)}&depth=1` : '?depth=1';
const res = await fetch(`/api/files/tree${params}`);
const requestPath = `/api/files/tree${params}`;
const res = await fetch(requestPath);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fetch calls are missing instance routing — file tree always queries master.

The fetchChildren function calls /api/files/tree directly without using routeApiPath or addInstanceHeaderToFetch from src/lib/apiRouting.ts. Since /api/files is in INSTANCE_ROUTED_API_PREFIXES, these calls should include the instanceId to route to the correct instance.

Currently, even when activeInstanceId changes, the file tree will always show the master's workspace.

🔧 Proposed fix using routeApiPath
+import { routeApiPath } from '@/lib/apiRouting';

 const fetchChildren = useCallback(async (dirPath: string): Promise<TreeEntry[] | null> => {
   try {
     const params = dirPath ? `?path=${encodeURIComponent(dirPath)}&depth=1` : '?depth=1';
-    const requestPath = `/api/files/tree${params}`;
+    const requestPath = routeApiPath(`/api/files/tree${params}`, activeInstanceId);
     const res = await fetch(requestPath);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const fetchChildren = useCallback(async (dirPath: string): Promise<TreeEntry[] | null> => {
try {
const params = dirPath ? `?path=${encodeURIComponent(dirPath)}&depth=1` : '?depth=1';
const res = await fetch(`/api/files/tree${params}`);
const requestPath = `/api/files/tree${params}`;
const res = await fetch(requestPath);
const fetchChildren = useCallback(async (dirPath: string): Promise<TreeEntry[] | null> => {
try {
const params = dirPath ? `?path=${encodeURIComponent(dirPath)}&depth=1` : '?depth=1';
const requestPath = routeApiPath(`/api/files/tree${params}`, activeInstanceId);
const res = await fetch(requestPath);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/features/file-browser/hooks/useFileTree.ts` around lines 98 - 102,
fetchChildren is calling the hardcoded endpoint `/api/files/tree` so the request
always hits master; update fetchChildren to build the routed path and headers
using the routing helpers: call routeApiPath('/api/files/tree',
activeInstanceId) to get the correct URL and use addInstanceHeaderToFetch(fetch,
activeInstanceId) (or pass the returned headers) when invoking fetch so requests
obey INSTANCE_ROUTED_API_PREFIXES and follow the currently selected instance;
keep existing params (`?path=...&depth=1`) and replace the direct requestPath
and fetch call with these helpers inside the same fetchChildren function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant