feat: add multi-claw instance creation flow and health updates#98
feat: add multi-claw instance creation flow and health updates#98meka-ichi wants to merge 25 commits intodaggerhashimoto:masterfrom
Conversation
📝 WalkthroughWalkthroughAdds Multi-Claw local Docker instance support: build/runtime Docker assets and startup script, comprehensive server-side instance management, proxying and routing, extensive API/routes/tests, frontend instance context/UI integration, per-instance fetch/WS routing, and documentation and progress artifacts. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant UI
participant Server
participant Docker
participant OpenClaw
User->>UI: Open CreateInstanceDialog, submit name+configs
UI->>Server: POST /api/instances (create)
Server->>Docker: docker run (selected multiclaw image, mounts/env)
Docker->>OpenClaw: Start gateway + nerve inside container
Server->>Server: Prepare instance workspace, copy creds, write openclaw.json
Server->>OpenClaw: Probe health /token (via published ports)
OpenClaw-->>Server: Ready / token
Server-->>UI: 201 Created (instanceId)
UI->>UI: refresh instances, set activeInstanceId
sequenceDiagram
participant Client
participant Middleware
participant InstanceProxy
participant LocalInstance
Client->>Middleware: GET /api/files/read?instanceId=inst-1
Middleware->>Middleware: validate instanceId & path eligibility
Middleware->>InstanceProxy: proxyToInstance(instanceId, path, headers, body)
InstanceProxy->>LocalInstance: fetch http://127.0.0.1:<published>/api/files/read (forwarded headers)
LocalInstance-->>InstanceProxy: response (filtered headers)
InstanceProxy-->>Middleware: proxied response
Middleware-->>Client: return proxied response
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Tip You can disable the changed files summary in the walkthrough.Disable the |
SummaryReviewed PR #98 against Major Issues
Minor Issues
Suggested Fixes
|
Follow-up SummarySecond pass over Blockers
Test/CI Gaps
Checklist Readiness
RecommendationRequest changes before merge: implement WS token fail-fast + regression test, document/test |
Greptile SummaryThis PR introduces a full multi-claw instance management stack, evolving the application from single-instance to a multi-container model. It adds Docker lifecycle management (create/stop/delete), centralized instance-aware HTTP and WebSocket routing, a credential copy mechanism for bootstrapping new instances, and a frontend context ( Key concerns identified:
Confidence Score: 2/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant UI as Browser UI
participant IC as InstanceContext
participant Nerve as Nerve Server
participant Docker as Docker Daemon
participant Inst as Instance Container
Note over UI,IC: Instance Discovery (every 5s)
UI->>Nerve: GET /api/instances
Nerve->>Docker: docker ps -a + docker inspect
Docker-->>Nerve: Container metadata
Nerve-->>UI: LocalInstanceSummary[]
Note over UI,Nerve: Instance Creation
UI->>Nerve: POST /api/instances
Nerve->>Nerve: prepareInstanceWorkspace()<br/>writes .openclaw state dir,<br/>openclaw.json with generated gateway credential
Nerve->>Docker: docker run -d -v stateDir:/home/node/.openclaw
Docker-->>Nerve: containerId
Nerve-->>UI: CreateDockerInstanceResult
Note over UI,IC: Instance-Routed API Call
UI->>IC: fetch('/api/sessions')
IC->>IC: addInstanceHeaderToFetch()<br/>adds X-Instance-Id header
IC->>Nerve: GET /api/sessions (X-Instance-Id: id)
Nerve->>Nerve: instanceRoutingMiddleware<br/>resolveRequestedInstanceId()
Nerve->>Docker: docker inspect id
Docker-->>Nerve: published ports
Nerve->>Inst: GET http://127.0.0.1:nervePort/api/sessions
Inst-->>Nerve: upstream response
Nerve-->>UI: filtered proxied response
Note over UI,Inst: Instance-Routed WebSocket
UI->>Nerve: WS /ws?instanceId=id
Nerve->>Docker: inspect + resolve gateway port + get credential
Docker-->>Nerve: gatewayPort + gateway credential
Nerve->>Inst: WS ws://127.0.0.1:gatewayPort/ws
Inst-->>Nerve: connect.challenge nonce
Nerve->>Inst: connect with instance credential (no device identity)
Inst-->>UI: relay messages bidirectionally
Last reviewed commit: c2a2886 |
| function generateInstanceGatewayToken(): string { | ||
| const alphabet = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789'; | ||
| const bytes = randomBytes(32); | ||
| let token = ''; | ||
| for (let i = 0; i < bytes.length; i += 1) { | ||
| token += alphabet[bytes[i] % alphabet.length]; | ||
| } | ||
| return token; | ||
| } |
There was a problem hiding this comment.
Biased modulo reduction in gateway token generation
bytes[i] % alphabet.length introduces a statistical bias because 256 is not evenly divisible by 62 (256 = 62 × 4 + 8). The first 8 characters in the alphabet (A–H) are selected with probability 5/256 ≈ 1.95% instead of the uniform 1/62 ≈ 1.61%. While the bias is small (~0.4%), this is a security-sensitive token used for gateway authentication, and best practice is to avoid any non-uniform distribution.
Switching to a 64-character alphabet (e.g., base64url), or using rejection sampling, eliminates the bias:
function generateInstanceGatewayToken(): string {
// base64url alphabet has 64 chars — 256 / 64 = 4 exactly, no bias
const alphabet = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_';
const bytes = randomBytes(32);
let token = '';
for (let i = 0; i < bytes.length; i += 1) {
token += alphabet[bytes[i] % alphabet.length];
}
return token;
}
| async function resolveRuntimeGatewayTokenViaExec(containerId: string): Promise<string | null> { | ||
| try { | ||
| const script = "fetch('http://127.0.0.1:3080/api/connect-defaults').then(r=>r.json()).then(j=>process.stdout.write((j&&typeof j.token==='string')?j.token:''))"; | ||
| const stdout = await runDocker(['exec', containerId, 'node', '-e', script]); | ||
| const token = stdout.trim(); | ||
| return token || null; | ||
| } catch { | ||
| return null; | ||
| } | ||
| } |
There was a problem hiding this comment.
GET /api/instances/:id/token silently triggers docker exec as a side effect
resolveRuntimeGatewayTokenViaExec is called from resolveInstanceToken, which is called from getInstanceToken, which backs the GET /api/instances/:id/token endpoint. This means issuing a simple read-only token lookup can cause the server to execute a Node.js script inside a running container via docker exec.
This is a surprising side effect for a GET endpoint and poses two concerns:
- Unexpected privilege escalation scope: Any authenticated caller of
GET /api/instances/:id/tokenimplicitly gains the ability to trigger code execution in a matched container without the API signature suggesting this. - Race condition on container ID: The
containerIdis obtained frominspect.Idafter a precedinginspectLocalOpenClawInstancecall. Between the inspect and thedocker exec, the container could be stopped/replaced. Docker will reject the exec in that case, but the attempt still occurs.
Consider separating the runtime fallback into an explicit probe operation (e.g., a POST endpoint or a dedicated ?probe=true query parameter) so that callers can opt into the exec-based lookup intentionally.
| useEffect(() => { | ||
| const originalFetch = globalThis.fetch.bind(globalThis); | ||
| globalThis.fetch = ((input: RequestInfo | URL, init?: RequestInit) => { | ||
| const routed = addInstanceHeaderToFetch( | ||
| input, | ||
| init, | ||
| activeInstanceIdRef.current, | ||
| window.location.origin, | ||
| ); | ||
| return originalFetch(routed.input, routed.init); | ||
| }) as typeof fetch; | ||
|
|
||
| return () => { | ||
| globalThis.fetch = originalFetch; | ||
| }; | ||
| }, []); |
There was a problem hiding this comment.
Global fetch monkey-patch can double-wrap in React Strict Mode
The useEffect with [] dependencies captures originalFetch = globalThis.fetch.bind(globalThis) at mount time. In React 18 Strict Mode (development), effects are intentionally run twice: mounted → cleanup → remounted. On the second mount, globalThis.fetch is already the patched wrapper from the first mount (cleanup restored it, but only if the return cleanup ran correctly). However, if there is any async gap between cleanup and re-run, or if InstanceProvider is used higher than Strict Mode boundaries, the original reference could already be a wrapper.
More practically: if InstanceProvider is ever unmounted and remounted (e.g., session expiry, hot-reload in development), originalFetch in the second lifecycle captures the patched version from the first lifecycle, creating a permanently nested wrapper. After the second unmount's cleanup, globalThis.fetch is restored to the first lifecycle's patched version rather than the browser native.
A safer pattern uses a stable outer ref:
const nativeFetchRef = useRef<typeof fetch | null>(null);
useEffect(() => {
// Only capture the native fetch once, on first mount
if (!nativeFetchRef.current) {
nativeFetchRef.current = globalThis.fetch.bind(globalThis);
}
const originalFetch = nativeFetchRef.current;
globalThis.fetch = ((input, init) => {
const routed = addInstanceHeaderToFetch(input, init, activeInstanceIdRef.current, window.location.origin);
return originalFetch(routed.input, routed.init);
}) as typeof fetch;
return () => {
globalThis.fetch = originalFetch;
};
}, []);| const copiedCredentialKeys = [...selectedConfigurations.keys(), ...selectedProfileKeys]; | ||
| const preparedStateDir = prepareInstanceWorkspace(input.name, effectiveCredentialKeys); | ||
|
|
||
| const args = ['run', '-d', '--name', input.name, '-p', '0:3080', '-p', '0:3181', '-v', `${preparedStateDir}:/home/node/.openclaw`, '-e', 'NERVE_ALLOW_INSECURE=true']; |
There was a problem hiding this comment.
NERVE_ALLOW_INSECURE=true is hardcoded for every created instance
Every instance is started with -e NERVE_ALLOW_INSECURE=true regardless of context. This disables certain security checks in the Nerve gateway and becomes permanently baked into all managed containers. If a deployment ever moves these containers from localhost to a networked environment, the insecure flag remains silently active.
Consider making this conditional on a server-side config flag or at least documenting in a comment why it's unconditionally set. It could also be exposed as a CreateDockerInstanceInput option so operators can opt in explicitly.
| export const MASTER_ONLY_API_PREFIXES = [ | ||
| '/api/instances', | ||
| '/api/auth', | ||
| ] as const; | ||
|
|
||
| export const INSTANCE_PROXYABLE_API_PREFIXES = [ | ||
| '/api/agentlog', | ||
| '/api/channels', | ||
| '/api/claude-code-limits', | ||
| '/api/codex-limits', | ||
| '/api/connect-defaults', | ||
| '/api/crons', | ||
| '/api/events', | ||
| '/api/files', | ||
| '/api/gateway', | ||
| '/api/keys', | ||
| '/api/language', | ||
| '/api/memories', | ||
| '/api/server-info', | ||
| '/api/sessions', | ||
| '/api/skills', | ||
| '/api/tokens', | ||
| '/api/transcribe', | ||
| '/api/tts', | ||
| '/api/version', | ||
| '/api/version/check', | ||
| '/api/voice-phrases', | ||
| '/api/workspace', | ||
| ] as const; |
There was a problem hiding this comment.
Duplicated prefix lists between server and client with no shared source-of-truth
MASTER_ONLY_API_PREFIXES and INSTANCE_PROXYABLE_API_PREFIXES are defined here in server/lib/instance-routing.ts (lines 6–34) and duplicated verbatim in src/lib/apiRouting.ts (lines 1–29). The two lists are currently identical, but there is no compile-time or test-time enforcement that they stay in sync.
If a new route is added to the server's INSTANCE_PROXYABLE_API_PREFIXES without updating the client list (or vice versa), the frontend will silently fail to attach the X-Instance-Id header for that route, resulting in requests unexpectedly hitting the master rather than the selected instance — a subtle routing regression that could be hard to debug.
Consider extracting the shared constants into a single generated artifact (e.g., a JSON file read by both sides during build, or a server/lib/routing-constants.ts that the client imports via a workspace package) so that additions in one place automatically propagate to the other.
| if (finalEnvEntries.size > 0) { | ||
| const envContent = [...finalEnvEntries.entries()] | ||
| .map(([key, value]) => `${key}=${value}`) | ||
| .join('\n'); | ||
| fs.writeFileSync(path.join(stateDir, '.env'), `${envContent}\n`, 'utf8'); |
There was a problem hiding this comment.
Unescaped values can corrupt the written dotenv file
The dotenv file is written with no quoting or escaping of the value portion of each entry. If any credential value contains a newline character, the resulting file will have a raw newline mid-value. When the file is later read by parseDotEnvShallow, content after the embedded newline is treated as a separate line — either silently dropped or mis-parsed as a different entry, causing the credential to be silently truncated or corrupted.
The serialization block should wrap each value in double-quotes and escape any embedded double-quotes, backslashes, and newlines before writing, consistent with the quoting that parseDotEnvShallow already handles on the read side.
There was a problem hiding this comment.
Actionable comments posted: 7
🧹 Nitpick comments (16)
PROGRESS.md (1)
64-74: Give each validation section a unique heading.The repeated
## Validation notesheaders create duplicate anchors and already trigger MD024. Renaming them per milestone keeps the document navigable.Also applies to: 102-109, 140-147
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@PROGRESS.md` around lines 64 - 74, Replace duplicate "## Validation notes" headers with unique headings per section (for example "## Validation notes — Milestone 1", "## Validation notes — Milestone 2", etc.) so each anchor is unique and MD024 is avoided; update the three occurrences that currently use the same "## Validation notes" heading (the block shown in the diff and the other two similar sections) and keep the existing bullet content unchanged.server/lib/instance-routing.ts (1)
1-4: Instance ID regex allows up to 128 characters.The regex
{0,127}after the first character allows 1-128 total characters. This is more permissive than Docker container name limits (typically 63 chars as used in CreateInstanceDialog). Consider aligning these limits for consistency, or document why they differ.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/lib/instance-routing.ts` around lines 1 - 4, The INSTANCE_ID_RE currently allows 1–128 characters which exceeds the Docker/container name limit used elsewhere (e.g., CreateInstanceDialog); update INSTANCE_ID_RE to enforce the same max length as the UI by changing the quantifier so the total length is 1–63 characters (adjust the `{0,127}` to `{0,62}`) or, if 128 is intentional, add a clear comment explaining the deliberate mismatch and reference CreateInstanceDialog; ensure the change or comment is made at the INSTANCE_ID_RE declaration so all validations remain consistent.src/features/sessions/CreateInstanceDialog.tsx (1)
239-244: Validation message shown even when form is untouched.Lines 239-241 display
validationMessage(e.g., "Name is required.") immediately when the dialog opens with an empty name. This might feel intrusive for users who haven't interacted with the form yet. Consider showing validation only after the first blur or submit attempt.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/features/sessions/CreateInstanceDialog.tsx` around lines 239 - 244, The validation message is shown immediately on open; modify CreateInstanceDialog to only show validationMessage after the user has interacted or tried to submit: add a local boolean state (e.g., touched or attemptedSubmit) and set it to true on the name input's onBlur (and also set attemptedSubmit true in the submit handler), then change the rendering condition from "validationMessage && !error" to "validationMessage && !error && (touched || attemptedSubmit)". Update the name input component to call the onBlur handler that flips touched true and ensure the submit path sets attemptedSubmit so validation appears after submit.Dockerfile.multiclaw (1)
21-24: Pinopenclawto a specific version for reproducibility.The global
npm install -g openclawinstalls the latest version at build time, which could lead to inconsistent behavior across builds. Pin it explicitly to ensure consistency.♻️ Proposed fix
RUN apt-get update \ && apt-get install -y --no-install-recommends ca-certificates bash git \ && rm -rf /var/lib/apt/lists/* \ - && npm install -g openclaw + && npm install -g openclaw@2026.3.2🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Dockerfile.multiclaw` around lines 21 - 24, The Dockerfile currently installs openclaw without a version which leads to non-reproducible builds; update the RUN that calls npm install -g openclaw to pin a specific version (e.g., npm install -g openclaw@<version>) or add a build ARG like OPENCLAW_VERSION with a sensible default and use npm install -g openclaw@$OPENCLAW_VERSION so builds are deterministic; modify the RUN line containing npm install -g openclaw to reference that pinned version/ARG and document/update any related build instructions.src/features/sessions/SessionList.test.tsx (1)
70-86: Test coverage is good, but consider strengthening the disabled button assertion.The test correctly verifies the button is disabled when
state: 'exited'. However, clicking a disabled button in the DOM typically doesn't fire click handlers, so line 84-85 may not be testing the actual guard logic—it's testing browser behavior.Consider adding a test that verifies the component's internal logic doesn't call fetch even if somehow triggered, or simply remove the click attempt since
expect(stopButton).toBeDisabled()already validates the intended behavior.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/features/sessions/SessionList.test.tsx` around lines 70 - 86, The test asserts the stop button is disabled but then attempts a userEvent.click which only verifies browser behavior, so either remove the click and associated fetchMock assertion or strengthen the test to verify component logic: update the test in SessionList.test.tsx to keep the renderList call and the expect(stopButton).toBeDisabled() assertion, then either (A) delete the userEvent.click(stopButton) and the expect(fetchMock).not.toHaveBeenCalled() lines, or (B) instead of simulating a DOM click, directly call the component's stop handler (or trigger the same action via the instance method used by the Stop button) and assert fetchMock was not invoked; reference the stopButton variable, renderList helper, and fetchMock when making the change.src/features/file-browser/FileTreePanel.tsx (1)
576-578: Consider conditionalizing the debug status line for production.The debug line showing
inst=...and status is useful for development but may clutter the UI for end users. Consider hiding it behind an environment flag or removing it before the PR leaves draft status.♻️ Optional: Conditionalize debug UI
- <div className="mt-1 text-[9px] font-mono text-muted-foreground truncate" title={`${debug.lastPath} ${debug.error || ''}`}> - inst={debug.instanceId ? debug.instanceId.slice(0, 8) : 'master'} · {debug.ok ? `ok:${debug.status ?? '-'}` : `err:${debug.error || debug.status || 'unknown'}`} - </div> + {import.meta.env.DEV && ( + <div className="mt-1 text-[9px] font-mono text-muted-foreground truncate" title={`${debug.lastPath} ${debug.error || ''}`}> + inst={debug.instanceId ? debug.instanceId.slice(0, 8) : 'master'} · {debug.ok ? `ok:${debug.status ?? '-'}` : `err:${debug.error || debug.status || 'unknown'}`} + </div> + )}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/features/file-browser/FileTreePanel.tsx` around lines 576 - 578, The debug status line (the div rendering debug.lastPath, debug.instanceId, debug.ok/status/error) should be hidden in production; wrap that debug rendering in a conditional check (e.g., only render when a debug flag is true or when NODE_ENV !== 'production' or when a prop like showDebug/isDraft is set) so end users don't see the inst=... status; locate the div that references debug.lastPath, debug.instanceId and debug.ok and gate its rendering behind the chosen environment/feature flag.src/hooks/useDashboardData.ts (1)
140-158: Consider clearing state on instance switch to avoid stale data flash.When
activeInstanceIdchanges, the effect restarts and setsmemoriesLoadingto true, butmemoriesandtokenDataretain their previous values until the fetch completes. Users may briefly see data from the previous instance.♻️ Optional: Clear state on instance change
// Initial fetch and polling (restarts when active instance changes) useEffect(() => { const controller = new AbortController(); // Fetch immediately for current instance context. setMemoriesLoading(true); + setMemories([]); + setTokenData(null); refreshMemories(controller.signal); refreshTokens(controller.signal);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/hooks/useDashboardData.ts` around lines 140 - 158, When activeInstanceId changes the effect restarts but previous memories/tokenData remain until fetch completes, causing a stale flash; update the start of the useEffect (the block that creates controller and calls refreshMemories/refreshTokens) to clear the previous instance state: call setMemories([]) (or appropriate empty shape) and reset tokenData (e.g., setTokenData(null) or []) and setTokensLoading(true) alongside setMemoriesLoading(true) before invoking refreshMemories/refreshTokens so the UI shows cleared loading state for the new instance; keep the existing abort controller/interval logic unchanged and ensure you reference useEffect, refreshMemories, refreshTokens, setMemoriesLoading, setMemories, tokenData/setTokenData, and activeInstanceId when making the change.src/contexts/GatewayContext.tsx (1)
146-163: Logic is correct; consider simplifying the credential retrieval for readability.The reconnection logic correctly handles instance switches by disconnecting and reconnecting with saved credentials. The early return on line 148 prevents re-entry issues.
The IIFE on lines 151-155 works but is slightly harder to parse at a glance.
♻️ Optional: Simplify credential retrieval
- const saved = lastConnectRef.current || (() => { - const loaded = loadConfig(); - if (loaded.url && loaded.token) return { url: loaded.url as string, token: loaded.token as string }; - return null; - })(); + let saved = lastConnectRef.current; + if (!saved) { + const loaded = loadConfig(); + if (loaded.url && loaded.token) { + saved = { url: loaded.url as string, token: loaded.token as string }; + } + } if (!saved) return;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/contexts/GatewayContext.tsx` around lines 146 - 163, Replace the IIFE credential retrieval with a clearer sequence: read lastConnectRef.current into a variable (e.g., const last = lastConnectRef.current), if last exists use it as saved, otherwise call loadConfig() into a variable (e.g., const loaded = loadConfig()) and derive saved from loaded.url/loaded.token; keep the rest of the effect logic using previousInstanceIdRef, connectionState, disconnect, and connect unchanged so reconnect uses the same saved {url, token} structure.src/contexts/InstanceContext.tsx (1)
71-86: Global fetch override pattern requires careful coordination.The global
fetchoverride is captured once on mount with an empty dependency array. This works correctly for this use case, but be aware that if other code also overridesglobalThis.fetch, the order of mount/unmount could cause unexpected behavior. Consider documenting this constraint in a comment.📝 Suggested documentation
useEffect(() => { + // Note: We override globalThis.fetch once on mount to inject instance routing headers. + // This assumes no other code overrides fetch after this provider mounts. + // If multiple fetch overrides are needed, consider a different pattern (e.g., custom fetch wrapper). const originalFetch = globalThis.fetch.bind(globalThis); globalThis.fetch = ((input: RequestInfo | URL, init?: RequestInit) => {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/contexts/InstanceContext.tsx` around lines 71 - 86, The current useEffect replaces globalThis.fetch on mount (and restores it on unmount) which can conflict if other modules also override fetch; add a concise comment above this useEffect explaining the coordination constraint: that originalFetch is captured on mount, restoration happens on unmount, and that consumers must ensure mount/unmount ordering (or avoid other global fetch overrides) to prevent unexpected behavior; reference the useEffect block, globalThis.fetch, and addInstanceHeaderToFetch so future maintainers know why the override is safe here and what to watch for.src/features/file-browser/hooks/useFileTree.ts (1)
130-135: Consider making the retry delay configurable or documenting the rationale.The 200ms retry delay for handling instance switch race conditions is reasonable, but consider adding a brief comment explaining why this specific delay was chosen, or extracting it as a named constant for clarity.
💡 Suggested improvement
+ const INSTANCE_SWITCH_RETRY_DELAY_MS = 200; + // Initial load const loadRoot = useCallback(async () => { setLoading(true); setError(null); let children = await fetchChildren(''); if (!children) { - // Instance switch can race with transport reconnect; retry once quickly. - await new Promise((resolve) => setTimeout(resolve, 200)); + // Instance switch can race with transport reconnect; retry once after brief delay. + await new Promise((resolve) => setTimeout(resolve, INSTANCE_SWITCH_RETRY_DELAY_MS)); children = await fetchChildren(''); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/features/file-browser/hooks/useFileTree.ts` around lines 130 - 135, Extract the hard-coded 200ms sleep in useFileTree where fetchChildren('') is retried into a named constant (e.g., CHILDREN_RETRY_DELAY_MS) or make it a configurable parameter to the hook, and add a short comment explaining the rationale (to handle instance-switch/transport-reconnect race) and why 200ms was chosen; update the retry code that currently awaits setTimeout to use that constant/parameter and ensure any tests or callers are adjusted if you make it configurable.src/features/sessions/SessionList.tsx (1)
147-171: Consider using a custom dialog for instance removal confirmation.The
window.confirmapproach works but is inconsistent with the customDialogcomponent used for session deletion (lines 397-435). For UI consistency, consider using a similar custom dialog for instance removal.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/features/sessions/SessionList.tsx` around lines 147 - 171, The removeInstance handler currently uses window.confirm which breaks UI consistency; replace it by wiring a custom Dialog component similar to the session deletion dialog: add component state (e.g., removeDialogOpen: boolean and pendingRemoveInstance: DiscoveredInstance | null), change removeInstance to only set pendingRemoveInstance and open the dialog (do not perform deletion), render the Dialog with the same confirm/cancel flow as the session delete dialog, and move the existing deletion logic (fetch, setInstanceMutationError, setInstanceMutatingId, onSelectInstance clearing when activeInstanceId matches, await onRefreshInstances, error handling and finally clearing setInstanceMutatingId) into the dialog confirm callback; ensure you update useCallback dependencies (activeInstanceId, onRefreshInstances, onSelectInstance) and clear pendingRemoveInstance when canceling or after completion.src/lib/apiRouting.ts (2)
60-64: Fragment (#hash) in URL is silently dropped when instanceId is appended.If
pathWithQuerycontains a fragment (e.g.,/api/sessions?foo=bar#section), the fragment is passed toURLSearchParamswhich ignores it. The reconstructed URL loses the fragment entirely.This may be intentional for API paths (fragments are typically client-side only), but worth confirming the behavior is expected.
🔧 Proposed fix to preserve fragments
function splitPathAndQuery(input: string): { pathname: string; suffix: string } { - const idx = input.search(/[?#]/); + const idx = input.indexOf('?'); if (idx === -1) return { pathname: input, suffix: '' }; return { pathname: input.slice(0, idx), suffix: input.slice(idx) }; }Or handle fragments explicitly if needed:
export function routeApiPath(pathWithQuery: string, instanceId: string | null): string { if (!instanceId) return pathWithQuery; const { pathname, suffix } = splitPathAndQuery(pathWithQuery); // ... prefix checks ... - const query = suffix.startsWith('?') ? suffix.slice(1) : suffix; + const hashIdx = suffix.indexOf('#'); + const queryPart = hashIdx === -1 ? suffix : suffix.slice(0, hashIdx); + const fragmentPart = hashIdx === -1 ? '' : suffix.slice(hashIdx); + const query = queryPart.startsWith('?') ? queryPart.slice(1) : queryPart; const params = new URLSearchParams(query); params.set(INSTANCE_ID_QUERY_PARAM, instanceId); const nextQuery = params.toString(); - return nextQuery ? `${pathname}?${nextQuery}` : pathname; + return nextQuery ? `${pathname}?${nextQuery}${fragmentPart}` : `${pathname}${fragmentPart}`; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/lib/apiRouting.ts` around lines 60 - 64, The current construction of the URL drops any fragment because suffix is fed into URLSearchParams; update the logic that builds the return URL (variables: suffix, query, params, nextQuery, pathname and the constant INSTANCE_ID_QUERY_PARAM) to detect and strip a fragment identifier from suffix before creating URLSearchParams, set the instance id param on params, then reattach the original fragment (prefixed with '#') to the final returned string (i.e., return `${pathname}?${nextQuery}#${fragment}` when a fragment existed, otherwise keep the existing return behavior). Ensure fragment extraction handles the case where suffix starts with '?' as the existing code does.
98-103: Redundant MASTER_ONLY check after INSTANCE_ROUTED check.The master-only check (lines 101-103) is unnecessary because master-only prefixes and instance-routed prefixes are disjoint sets. If a path doesn't match any instance-routed prefix (line 98), it will already return unchanged, making the master-only check unreachable for those paths.
However, this is defensive coding and doesn't affect correctness—just a minor clarity nit.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/lib/apiRouting.ts` around lines 98 - 103, Remove the redundant MASTER_ONLY_API_PREFIXES check that follows the INSTANCE_ROUTED_API_PREFIXES check: since INSTANCE_ROUTED_API_PREFIXES and MASTER_ONLY_API_PREFIXES are disjoint, the second if-block (MASTER_ONLY_API_PREFIXES.some((prefix) => matchesPrefix(pathname, prefix))) is unreachable after the early return and should be deleted to improve clarity; keep the initial INSTANCE_ROUTED_API_PREFIXES check (using matchesPrefix and pathname) and any remaining logic that handles master-only routes in their appropriate place.server/lib/docker-instances.ts (2)
966-970: Consider restricting file permissions for credential files.The auth-profiles.json (line 968), .env (line 1029), and openclaw.json (line 1032) files contain sensitive credentials (API keys, tokens). These are created with default permissions (typically 0644, readable by all users on the system).
For better security hygiene, consider restricting permissions to owner-only.
🔒 Proposed fix to restrict permissions
+ // Restrict permissions for credential files + const credentialFileOpts = { encoding: 'utf8' as const, mode: 0o600 }; + if (filteredStore) { fs.writeFileSync( path.join(agentDir, 'auth-profiles.json'), `${JSON.stringify(filteredStore, null, 2)}\n`, - 'utf8', + credentialFileOpts, ); } // ... if (finalEnvEntries.size > 0) { const envContent = [...finalEnvEntries.entries()] .map(([key, value]) => `${key}=${value}`) .join('\n'); - fs.writeFileSync(path.join(stateDir, '.env'), `${envContent}\n`, 'utf8'); + fs.writeFileSync(path.join(stateDir, '.env'), `${envContent}\n`, credentialFileOpts); } - fs.writeFileSync(openclawConfigPath, `${JSON.stringify(cfg, null, 2)}\n`, 'utf8'); + fs.writeFileSync(openclawConfigPath, `${JSON.stringify(cfg, null, 2)}\n`, credentialFileOpts);Also applies to: 1029-1029, 1032-1032
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/lib/docker-instances.ts` around lines 966 - 970, The auth-profiles.json, .env and openclaw.json files are written with default permissive filesystem permissions; change the write calls that create these files (the fs.writeFileSync calls that write auth-profiles.json, .env and openclaw.json in docker-instances.ts) to set owner-only permissions (e.g. mode 0o600) by passing the mode option to fs.writeFileSync or by calling fs.chmodSync immediately after write; ensure you apply the same fix to the three write sites so credentials are only readable/writable by the file owner.
940-948: Minor modulo bias in token generation.Using
bytes[i] % alphabet.lengthwith a 62-character alphabet introduces slight bias since 256 is not evenly divisible by 62. The first 8 characters are ~1.6% more likely than others.For this use case (instance gateway tokens with 32 characters), the entropy is still sufficient, but this is worth noting.
♻️ Unbiased alternative using rejection sampling
function generateInstanceGatewayToken(): string { const alphabet = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789'; - const bytes = randomBytes(32); let token = ''; - for (let i = 0; i < bytes.length; i += 1) { - token += alphabet[bytes[i] % alphabet.length]; + while (token.length < 32) { + const bytes = randomBytes(32); + for (const byte of bytes) { + // Reject values >= 248 to avoid modulo bias (248 = 62 * 4) + if (byte < 248) { + token += alphabet[byte % 62]; + if (token.length === 32) break; + } + } } return token; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/lib/docker-instances.ts` around lines 940 - 948, The current generateInstanceGatewayToken function uses bytes[i] % alphabet.length which introduces a small modulo bias; change it to rejection sampling: compute a maxValid = Math.floor(256 / alphabet.length) * alphabet.length and only accept a random byte when byte < maxValid, mapping it to an alphabet index via byte % alphabet.length; continue drawing random bytes (e.g., by calling randomBytes in chunks) until you have 32 accepted characters, then return the token. Ensure you reference generateInstanceGatewayToken, alphabet, and randomBytes when updating the implementation.server/lib/instance-proxy.ts (1)
107-121: Redundant content-encoding check.Line 116 checks
lower !== 'content-encoding', butcontent-encodingis already excluded fromPROXY_RESPONSE_HEADERS(it's intentionally omitted per the comment on lines 18-19), so line 112 would already skip it.This is harmless but slightly redundant.
♻️ Simplified version
function buildResponseHeaders(upstream: Headers): Headers { const out = new Headers(); const upstreamHasContentEncoding = upstream.has('content-encoding'); for (const [key, value] of upstream.entries()) { const lower = key.toLowerCase(); if (!PROXY_RESPONSE_HEADERS.has(lower)) continue; // Node fetch may decode upstream payload while still exposing encoded length. // Do not forward stale content-length if upstream declared content-encoding. if (lower === 'content-length' && upstreamHasContentEncoding) continue; - if (lower !== 'content-encoding') { - out.set(key, value); - } + out.set(key, value); } return out; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/lib/instance-proxy.ts` around lines 107 - 121, The buildResponseHeaders function contains a redundant check for 'content-encoding' after PROXY_RESPONSE_HEADERS already excludes it; update buildResponseHeaders (function name: buildResponseHeaders, variable: upstreamHasContentEncoding, constant: PROXY_RESPONSE_HEADERS) to remove the unnecessary "if (lower !== 'content-encoding')" branch and simply set the header when the key passes the PROXY_RESPONSE_HEADERS filter (keeping the existing special-case that skips 'content-length' when upstreamHasContentEncoding is true).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/API.md`:
- Around line 417-420: The documentation currently describes target selection
for the "Master-side proxy scaffold" and "Target resolution" as preferring
container 18789/tcp (the gateway WS port), which is incorrect for proxied
instance HTTP routes like /api/version and /api/files; update the text to state
that the proxy targets the instance's Nerve HTTP port (the instance API port)
instead of 18789/tcp and clarify that port selection prefers the Nerve HTTP port
and otherwise falls back to the first published TCP port so readers are routed
to the correct Nerve API endpoints.
In `@docs/research-aramaki-multiclaw.md`:
- Around line 80-89: Remove the already-implemented items 1 and 4 from the "Next
recommended hardening (not yet implemented)" list in
docs/research-aramaki-multiclaw.md: delete the bullet about generating/writing
"~/.openclaw/openclaw.json" and the bullet about "Update proxy target logic to
differentiate Nerve HTTP port vs Gateway WS port", since those are implemented
elsewhere in this PR; keep the remaining items 2 and 3 as-is and optionally add
a brief note pointing readers to the sections where the openclaw.json bootstrap
and proxy/port changes were implemented if clarification is helpful.
In `@PROGRESS.md`:
- Around line 5-8: Update the "Status at a glance" snapshot at the top of
PROGRESS.md so it reflects the current state (Milestones 1–5 completed and E2E
validation done) rather than only "Milestone 1 + Milestone 2 foundation
completed"; modify the "Overall status:" line under the "Status at a glance"
heading (and keep the "Branch: `multi-claw`" line intact) to a concise, accurate
summary like "Milestones 1–5 completed; E2E validation passed."
In `@scripts/start-multiclaw.sh`:
- Around line 20-23: The token generation currently uses command substitution
inside an export which masks failures (ShellCheck SC2155); instead, run the
generator command (tr/head) into a local variable (e.g., new_token), check its
exit status and length, then assign/export OPENCLAW_GATEWAY_TOKEN and
GATEWAY_TOKEN from that variable; update the block that sets
OPENCLAW_GATEWAY_TOKEN and GATEWAY_TOKEN so failures from the tr|head pipeline
are detected and handled before exporting.
In `@server/lib/docker-instances.test.ts`:
- Around line 106-127: The test's execFileImpl/dockerArgs assertion is too
permissive (uses expect.arrayContaining) and omits the '-e' flag before the
OPENAI_API_KEY value; update the assertion in the execFileImpl handler to assert
the exact docker run argument structure (either replace expect.arrayContaining
with a toEqual check of the full expected args array including '-e',
'OPENAI_API_KEY=openai-test-key', and other '-e' entries, or add an explicit
check that for each env var string in dockerArgs there is a preceding '-e'
token). Locate the check around the dockerArgs variable in execFileImpl and make
the assertion verify the pairing/order of '-e' flags and their values rather
than only presence.
In `@server/routes/health.ts`:
- Around line 29-39: The /healthcheck route handler currently always returns a
2xx JSON response via c.json even when probeGateway() reports gateway !== 'ok';
update the handler (the app.get('/healthcheck' route using probeGateway(),
gateway and ready variables) to return a non-2xx status when readiness is false
(e.g., use c.status(503).json(...) or equivalent) so orchestration systems see
an unhealthy pod; keep the existing JSON body (service, status, ready, uptime,
gateway) but send it with the non-OK HTTP status when ready is false.
In `@src/contexts/InstanceContext.tsx`:
- Around line 44-61: The refreshInstances useCallback is missing
setActiveInstanceId in its dependency array; update the dependency list for the
refreshInstances callback to include setActiveInstanceId (alongside any existing
deps) so the hook explicitly depends on the setter used inside (references:
refreshInstances, setActiveInstanceId, activeInstanceIdRef, setInstances,
setInstancesLoading).
---
Nitpick comments:
In `@Dockerfile.multiclaw`:
- Around line 21-24: The Dockerfile currently installs openclaw without a
version which leads to non-reproducible builds; update the RUN that calls npm
install -g openclaw to pin a specific version (e.g., npm install -g
openclaw@<version>) or add a build ARG like OPENCLAW_VERSION with a sensible
default and use npm install -g openclaw@$OPENCLAW_VERSION so builds are
deterministic; modify the RUN line containing npm install -g openclaw to
reference that pinned version/ARG and document/update any related build
instructions.
In `@PROGRESS.md`:
- Around line 64-74: Replace duplicate "## Validation notes" headers with unique
headings per section (for example "## Validation notes — Milestone 1", "##
Validation notes — Milestone 2", etc.) so each anchor is unique and MD024 is
avoided; update the three occurrences that currently use the same "## Validation
notes" heading (the block shown in the diff and the other two similar sections)
and keep the existing bullet content unchanged.
In `@server/lib/docker-instances.ts`:
- Around line 966-970: The auth-profiles.json, .env and openclaw.json files are
written with default permissive filesystem permissions; change the write calls
that create these files (the fs.writeFileSync calls that write
auth-profiles.json, .env and openclaw.json in docker-instances.ts) to set
owner-only permissions (e.g. mode 0o600) by passing the mode option to
fs.writeFileSync or by calling fs.chmodSync immediately after write; ensure you
apply the same fix to the three write sites so credentials are only
readable/writable by the file owner.
- Around line 940-948: The current generateInstanceGatewayToken function uses
bytes[i] % alphabet.length which introduces a small modulo bias; change it to
rejection sampling: compute a maxValid = Math.floor(256 / alphabet.length) *
alphabet.length and only accept a random byte when byte < maxValid, mapping it
to an alphabet index via byte % alphabet.length; continue drawing random bytes
(e.g., by calling randomBytes in chunks) until you have 32 accepted characters,
then return the token. Ensure you reference generateInstanceGatewayToken,
alphabet, and randomBytes when updating the implementation.
In `@server/lib/instance-proxy.ts`:
- Around line 107-121: The buildResponseHeaders function contains a redundant
check for 'content-encoding' after PROXY_RESPONSE_HEADERS already excludes it;
update buildResponseHeaders (function name: buildResponseHeaders, variable:
upstreamHasContentEncoding, constant: PROXY_RESPONSE_HEADERS) to remove the
unnecessary "if (lower !== 'content-encoding')" branch and simply set the header
when the key passes the PROXY_RESPONSE_HEADERS filter (keeping the existing
special-case that skips 'content-length' when upstreamHasContentEncoding is
true).
In `@server/lib/instance-routing.ts`:
- Around line 1-4: The INSTANCE_ID_RE currently allows 1–128 characters which
exceeds the Docker/container name limit used elsewhere (e.g.,
CreateInstanceDialog); update INSTANCE_ID_RE to enforce the same max length as
the UI by changing the quantifier so the total length is 1–63 characters (adjust
the `{0,127}` to `{0,62}`) or, if 128 is intentional, add a clear comment
explaining the deliberate mismatch and reference CreateInstanceDialog; ensure
the change or comment is made at the INSTANCE_ID_RE declaration so all
validations remain consistent.
In `@src/contexts/GatewayContext.tsx`:
- Around line 146-163: Replace the IIFE credential retrieval with a clearer
sequence: read lastConnectRef.current into a variable (e.g., const last =
lastConnectRef.current), if last exists use it as saved, otherwise call
loadConfig() into a variable (e.g., const loaded = loadConfig()) and derive
saved from loaded.url/loaded.token; keep the rest of the effect logic using
previousInstanceIdRef, connectionState, disconnect, and connect unchanged so
reconnect uses the same saved {url, token} structure.
In `@src/contexts/InstanceContext.tsx`:
- Around line 71-86: The current useEffect replaces globalThis.fetch on mount
(and restores it on unmount) which can conflict if other modules also override
fetch; add a concise comment above this useEffect explaining the coordination
constraint: that originalFetch is captured on mount, restoration happens on
unmount, and that consumers must ensure mount/unmount ordering (or avoid other
global fetch overrides) to prevent unexpected behavior; reference the useEffect
block, globalThis.fetch, and addInstanceHeaderToFetch so future maintainers know
why the override is safe here and what to watch for.
In `@src/features/file-browser/FileTreePanel.tsx`:
- Around line 576-578: The debug status line (the div rendering debug.lastPath,
debug.instanceId, debug.ok/status/error) should be hidden in production; wrap
that debug rendering in a conditional check (e.g., only render when a debug flag
is true or when NODE_ENV !== 'production' or when a prop like showDebug/isDraft
is set) so end users don't see the inst=... status; locate the div that
references debug.lastPath, debug.instanceId and debug.ok and gate its rendering
behind the chosen environment/feature flag.
In `@src/features/file-browser/hooks/useFileTree.ts`:
- Around line 130-135: Extract the hard-coded 200ms sleep in useFileTree where
fetchChildren('') is retried into a named constant (e.g.,
CHILDREN_RETRY_DELAY_MS) or make it a configurable parameter to the hook, and
add a short comment explaining the rationale (to handle
instance-switch/transport-reconnect race) and why 200ms was chosen; update the
retry code that currently awaits setTimeout to use that constant/parameter and
ensure any tests or callers are adjusted if you make it configurable.
In `@src/features/sessions/CreateInstanceDialog.tsx`:
- Around line 239-244: The validation message is shown immediately on open;
modify CreateInstanceDialog to only show validationMessage after the user has
interacted or tried to submit: add a local boolean state (e.g., touched or
attemptedSubmit) and set it to true on the name input's onBlur (and also set
attemptedSubmit true in the submit handler), then change the rendering condition
from "validationMessage && !error" to "validationMessage && !error && (touched
|| attemptedSubmit)". Update the name input component to call the onBlur handler
that flips touched true and ensure the submit path sets attemptedSubmit so
validation appears after submit.
In `@src/features/sessions/SessionList.test.tsx`:
- Around line 70-86: The test asserts the stop button is disabled but then
attempts a userEvent.click which only verifies browser behavior, so either
remove the click and associated fetchMock assertion or strengthen the test to
verify component logic: update the test in SessionList.test.tsx to keep the
renderList call and the expect(stopButton).toBeDisabled() assertion, then either
(A) delete the userEvent.click(stopButton) and the
expect(fetchMock).not.toHaveBeenCalled() lines, or (B) instead of simulating a
DOM click, directly call the component's stop handler (or trigger the same
action via the instance method used by the Stop button) and assert fetchMock was
not invoked; reference the stopButton variable, renderList helper, and fetchMock
when making the change.
In `@src/features/sessions/SessionList.tsx`:
- Around line 147-171: The removeInstance handler currently uses window.confirm
which breaks UI consistency; replace it by wiring a custom Dialog component
similar to the session deletion dialog: add component state (e.g.,
removeDialogOpen: boolean and pendingRemoveInstance: DiscoveredInstance | null),
change removeInstance to only set pendingRemoveInstance and open the dialog (do
not perform deletion), render the Dialog with the same confirm/cancel flow as
the session delete dialog, and move the existing deletion logic (fetch,
setInstanceMutationError, setInstanceMutatingId, onSelectInstance clearing when
activeInstanceId matches, await onRefreshInstances, error handling and finally
clearing setInstanceMutatingId) into the dialog confirm callback; ensure you
update useCallback dependencies (activeInstanceId, onRefreshInstances,
onSelectInstance) and clear pendingRemoveInstance when canceling or after
completion.
In `@src/hooks/useDashboardData.ts`:
- Around line 140-158: When activeInstanceId changes the effect restarts but
previous memories/tokenData remain until fetch completes, causing a stale flash;
update the start of the useEffect (the block that creates controller and calls
refreshMemories/refreshTokens) to clear the previous instance state: call
setMemories([]) (or appropriate empty shape) and reset tokenData (e.g.,
setTokenData(null) or []) and setTokensLoading(true) alongside
setMemoriesLoading(true) before invoking refreshMemories/refreshTokens so the UI
shows cleared loading state for the new instance; keep the existing abort
controller/interval logic unchanged and ensure you reference useEffect,
refreshMemories, refreshTokens, setMemoriesLoading, setMemories,
tokenData/setTokenData, and activeInstanceId when making the change.
In `@src/lib/apiRouting.ts`:
- Around line 60-64: The current construction of the URL drops any fragment
because suffix is fed into URLSearchParams; update the logic that builds the
return URL (variables: suffix, query, params, nextQuery, pathname and the
constant INSTANCE_ID_QUERY_PARAM) to detect and strip a fragment identifier from
suffix before creating URLSearchParams, set the instance id param on params,
then reattach the original fragment (prefixed with '#') to the final returned
string (i.e., return `${pathname}?${nextQuery}#${fragment}` when a fragment
existed, otherwise keep the existing return behavior). Ensure fragment
extraction handles the case where suffix starts with '?' as the existing code
does.
- Around line 98-103: Remove the redundant MASTER_ONLY_API_PREFIXES check that
follows the INSTANCE_ROUTED_API_PREFIXES check: since
INSTANCE_ROUTED_API_PREFIXES and MASTER_ONLY_API_PREFIXES are disjoint, the
second if-block (MASTER_ONLY_API_PREFIXES.some((prefix) =>
matchesPrefix(pathname, prefix))) is unreachable after the early return and
should be deleted to improve clarity; keep the initial
INSTANCE_ROUTED_API_PREFIXES check (using matchesPrefix and pathname) and any
remaining logic that handles master-only routes in their appropriate place.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: fff8d92b-ca21-450b-bcb2-8143363c7276
📒 Files selected for processing (36)
.dockerignoreDockerfile.multiclawGOALS.mdPROGRESS.mdREADME.mddocs/API.mddocs/research-aramaki-multiclaw.mdscripts/start-multiclaw.shserver/app.tsserver/lib/docker-instances.test.tsserver/lib/docker-instances.tsserver/lib/instance-proxy.tsserver/lib/instance-routing.tsserver/lib/ws-proxy.test.tsserver/lib/ws-proxy.tsserver/middleware/instance-routing.test.tsserver/middleware/instance-routing.tsserver/routes/health.test.tsserver/routes/health.tsserver/routes/instances.test.tsserver/routes/instances.tssrc/App.tsxsrc/contexts/GatewayContext.tsxsrc/contexts/InstanceContext.tsxsrc/features/auth/AuthGate.tsxsrc/features/file-browser/FileTreePanel.tsxsrc/features/file-browser/hooks/useFileTree.tssrc/features/sessions/CreateInstanceDialog.tsxsrc/features/sessions/SessionList.test.tsxsrc/features/sessions/SessionList.tsxsrc/hooks/useDashboardData.tssrc/hooks/useServerEvents.tssrc/hooks/useWebSocket.test.tssrc/hooks/useWebSocket.tssrc/lib/apiRouting.test.tssrc/lib/apiRouting.ts
| Master-side proxy scaffold for forwarding API calls to a selected local Docker instance. | ||
|
|
||
| - Target resolution uses local Docker inspection and forwards to `http://127.0.0.1:<published-port>`. | ||
| - Port selection prefers container `18789/tcp`; if unavailable, falls back to the first published TCP port. |
There was a problem hiding this comment.
Document the proxy against the Nerve HTTP port, not the gateway port.
This endpoint proxies instance HTTP routes like /api/version and /api/files*. Saying it prefers 18789/tcp describes the gateway WS target instead of the instance’s Nerve API port, which will send readers to the wrong place.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/API.md` around lines 417 - 420, The documentation currently describes
target selection for the "Master-side proxy scaffold" and "Target resolution" as
preferring container 18789/tcp (the gateway WS port), which is incorrect for
proxied instance HTTP routes like /api/version and /api/files; update the text
to state that the proxy targets the instance's Nerve HTTP port (the instance API
port) instead of 18789/tcp and clarify that port selection prefers the Nerve
HTTP port and otherwise falls back to the first published TCP port so readers
are routed to the correct Nerve API endpoints.
| ## Next recommended hardening (not yet implemented) | ||
| 1. Optionally generate/write minimal `~/.openclaw/openclaw.json` in container (like Aramaki) so behavior is predictable across OpenClaw CLI versions. | ||
| 2. Add explicit readiness checks in startup script: | ||
| - wait for gateway port to accept connections | ||
| - fail fast if not ready in timeout window | ||
| 3. Add non-default secure profile example for production: | ||
| - `NERVE_AUTH=true` | ||
| - `NERVE_SESSION_SECRET` | ||
| - explicit auth password env. | ||
| 4. Update proxy target logic to differentiate Nerve HTTP port vs Gateway WS port if needed by frontend routing model. |
There was a problem hiding this comment.
Remove already-landed items from the “not yet implemented” list.
Item 1 (~/.openclaw/openclaw.json bootstrap) and item 4 (separate HTTP/WS port handling) are described elsewhere in this PR as implemented. Leaving them here as future work makes the research note misleading.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/research-aramaki-multiclaw.md` around lines 80 - 89, Remove the
already-implemented items 1 and 4 from the "Next recommended hardening (not yet
implemented)" list in docs/research-aramaki-multiclaw.md: delete the bullet
about generating/writing "~/.openclaw/openclaw.json" and the bullet about
"Update proxy target logic to differentiate Nerve HTTP port vs Gateway WS port",
since those are implemented elsewhere in this PR; keep the remaining items 2 and
3 as-is and optionally add a brief note pointing readers to the sections where
the openclaw.json bootstrap and proxy/port changes were implemented if
clarification is helpful.
| ## Status at a glance | ||
| - Branch: `multi-claw` | ||
| - Overall status: **Milestone 1 + Milestone 2 foundation completed** | ||
|
|
There was a problem hiding this comment.
Refresh the top-level status snapshot.
It still says only Milestones 1–2 are complete, but the rest of the file documents Milestones 3–5 and E2E validation. That makes the summary stale on first read.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@PROGRESS.md` around lines 5 - 8, Update the "Status at a glance" snapshot at
the top of PROGRESS.md so it reflects the current state (Milestones 1–5
completed and E2E validation done) rather than only "Milestone 1 + Milestone 2
foundation completed"; modify the "Overall status:" line under the "Status at a
glance" heading (and keep the "Branch: `multi-claw`" line intact) to a concise,
accurate summary like "Milestones 1–5 completed; E2E validation passed."
| if [[ -z "${OPENCLAW_GATEWAY_TOKEN:-}" ]]; then | ||
| export OPENCLAW_GATEWAY_TOKEN="$(tr -dc 'a-zA-Z0-9' </dev/urandom | head -c 32)" | ||
| export GATEWAY_TOKEN="${OPENCLAW_GATEWAY_TOKEN}" | ||
| fi |
There was a problem hiding this comment.
Declare and assign separately to avoid masking return values.
Shellcheck SC2155 flags line 21. If tr or head fails, the exit code is masked by the export assignment, and the script continues with an empty or partial token.
🛡️ Proposed fix
if [[ -z "${OPENCLAW_GATEWAY_TOKEN:-}" ]]; then
- export OPENCLAW_GATEWAY_TOKEN="$(tr -dc 'a-zA-Z0-9' </dev/urandom | head -c 32)"
+ OPENCLAW_GATEWAY_TOKEN="$(tr -dc 'a-zA-Z0-9' </dev/urandom | head -c 32)"
+ export OPENCLAW_GATEWAY_TOKEN
export GATEWAY_TOKEN="${OPENCLAW_GATEWAY_TOKEN}"
fi📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if [[ -z "${OPENCLAW_GATEWAY_TOKEN:-}" ]]; then | |
| export OPENCLAW_GATEWAY_TOKEN="$(tr -dc 'a-zA-Z0-9' </dev/urandom | head -c 32)" | |
| export GATEWAY_TOKEN="${OPENCLAW_GATEWAY_TOKEN}" | |
| fi | |
| if [[ -z "${OPENCLAW_GATEWAY_TOKEN:-}" ]]; then | |
| OPENCLAW_GATEWAY_TOKEN="$(tr -dc 'a-zA-Z0-9' </dev/urandom | head -c 32)" | |
| export OPENCLAW_GATEWAY_TOKEN | |
| export GATEWAY_TOKEN="${OPENCLAW_GATEWAY_TOKEN}" | |
| fi |
🧰 Tools
🪛 Shellcheck (0.11.0)
[warning] 21-21: Declare and assign separately to avoid masking return values.
(SC2155)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/start-multiclaw.sh` around lines 20 - 23, The token generation
currently uses command substitution inside an export which masks failures
(ShellCheck SC2155); instead, run the generator command (tr/head) into a local
variable (e.g., new_token), check its exit status and length, then assign/export
OPENCLAW_GATEWAY_TOKEN and GATEWAY_TOKEN from that variable; update the block
that sets OPENCLAW_GATEWAY_TOKEN and GATEWAY_TOKEN so failures from the tr|head
pipeline are detected and handled before exporting.
| execFileImpl = (_bin: unknown, args: unknown, _opts: unknown, cb: unknown) => { | ||
| const dockerArgs = args as string[]; | ||
| if (dockerArgs[0] === 'run') { | ||
| expect(dockerArgs).toEqual( | ||
| expect.arrayContaining([ | ||
| '-d', | ||
| '--name', | ||
| 'worker-a', | ||
| '-p', | ||
| '0:3080', | ||
| '-p', | ||
| '0:3181', | ||
| '-e', | ||
| 'NERVE_ALLOW_INSECURE=true', | ||
| 'OPENAI_API_KEY=openai-test-key', | ||
| 'multiclaw:test', | ||
| ]), | ||
| ); | ||
| expect(dockerArgs).toContain('-v'); | ||
| (cb as (err: null, stdout: string, stderr: string) => void)(null, 'cid-worker-a\n', ''); | ||
| return; | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify the actual docker run argument structure in createDockerInstance
ast-grep --pattern 'execFile($_, [$$$, "run", $$$], $_)'Repository: daggerhashimoto/openclaw-nerve
Length of output: 56
🏁 Script executed:
# First, let's read the test file around the mentioned lines
cat -n server/lib/docker-instances.test.ts | sed -n '90,140p'Repository: daggerhashimoto/openclaw-nerve
Length of output: 2132
🏁 Script executed:
# Let's also find the implementation file being tested
fd "docker-instances" --type f | grep -v node_modulesRepository: daggerhashimoto/openclaw-nerve
Length of output: 140
🏁 Script executed:
# Search for where OPENAI_API_KEY is used in the docker arguments
rg "OPENAI_API_KEY" server/lib/ -A 2 -B 2Repository: daggerhashimoto/openclaw-nerve
Length of output: 7861
🏁 Script executed:
# Read the docker-instances.ts implementation file to understand how args are built
cat -n server/lib/docker-instances.ts | head -100Repository: daggerhashimoto/openclaw-nerve
Length of output: 3709
🏁 Script executed:
# Search for where the docker run command is built with environment variables
rg "run.*-e" server/lib/docker-instances.ts -A 5 -B 5Repository: daggerhashimoto/openclaw-nerve
Length of output: 1184
🏁 Script executed:
# Search for the function that builds the docker args
rg "createDockerInstance\|execFile" server/lib/docker-instances.ts -A 10 -B 2 | head -100Repository: daggerhashimoto/openclaw-nerve
Length of output: 56
🏁 Script executed:
# Let me verify the exact line where env vars are added to confirm the full context
cat -n server/lib/docker-instances.ts | sed -n '280,295p'Repository: daggerhashimoto/openclaw-nerve
Length of output: 692
🏁 Script executed:
# Find the exact location where docker args are built with env variables
rg "args.push\('-e'" server/lib/docker-instances.ts -B 10 -A 5Repository: daggerhashimoto/openclaw-nerve
Length of output: 1065
Test assertion for docker run args is incomplete and masks potential bugs.
The assertion uses expect.arrayContaining() which only checks if expected items are present in the array, without verifying their order or proper pairing structure. Line 120 is missing the preceding -e flag that the actual implementation generates. According to the code, each environment variable should be passed with a -e flag (e.g., '-e', 'OPENAI_API_KEY=openai-test-key'), but the test only expects 'OPENAI_API_KEY=openai-test-key' without it. This means the test could pass even if the implementation incorrectly omits the -e flags. Consider using toEqual() with exact expected arguments or verify the -e KEY=value pairing structure more explicitly.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@server/lib/docker-instances.test.ts` around lines 106 - 127, The test's
execFileImpl/dockerArgs assertion is too permissive (uses
expect.arrayContaining) and omits the '-e' flag before the OPENAI_API_KEY value;
update the assertion in the execFileImpl handler to assert the exact docker run
argument structure (either replace expect.arrayContaining with a toEqual check
of the full expected args array including '-e',
'OPENAI_API_KEY=openai-test-key', and other '-e' entries, or add an explicit
check that for each env var string in dockerArgs there is a preceding '-e'
token). Locate the check around the dockerArgs variable in execFileImpl and make
the assertion verify the pairing/order of '-e' flags and their values rather
than only presence.
| app.get('/healthcheck', async (c) => { | ||
| const gateway = await probeGateway(); | ||
| const ready = gateway === 'ok'; | ||
| return c.json({ | ||
| service: 'nerve', | ||
| status: ready ? 'ready' : 'initializing', | ||
| ready, | ||
| uptime: process.uptime(), | ||
| gateway, | ||
| }); | ||
| }); |
There was a problem hiding this comment.
Return a non-2xx response when readiness is false.
Orchestrators usually key off the HTTP status code, not the JSON body. As written, /healthcheck still looks healthy when gateway === 'unreachable', so readiness probes will keep routing traffic to an unready instance.
Suggested fix
app.get('/healthcheck', async (c) => {
const gateway = await probeGateway();
const ready = gateway === 'ok';
- return c.json({
+ return c.json({
service: 'nerve',
status: ready ? 'ready' : 'initializing',
ready,
uptime: process.uptime(),
gateway,
- });
+ }, ready ? 200 : 503);
});📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| app.get('/healthcheck', async (c) => { | |
| const gateway = await probeGateway(); | |
| const ready = gateway === 'ok'; | |
| return c.json({ | |
| service: 'nerve', | |
| status: ready ? 'ready' : 'initializing', | |
| ready, | |
| uptime: process.uptime(), | |
| gateway, | |
| }); | |
| }); | |
| app.get('/healthcheck', async (c) => { | |
| const gateway = await probeGateway(); | |
| const ready = gateway === 'ok'; | |
| return c.json({ | |
| service: 'nerve', | |
| status: ready ? 'ready' : 'initializing', | |
| ready, | |
| uptime: process.uptime(), | |
| gateway, | |
| }, ready ? 200 : 503); | |
| }); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@server/routes/health.ts` around lines 29 - 39, The /healthcheck route handler
currently always returns a 2xx JSON response via c.json even when probeGateway()
reports gateway !== 'ok'; update the handler (the app.get('/healthcheck' route
using probeGateway(), gateway and ready variables) to return a non-2xx status
when readiness is false (e.g., use c.status(503).json(...) or equivalent) so
orchestration systems see an unhealthy pod; keep the existing JSON body
(service, status, ready, uptime, gateway) but send it with the non-OK HTTP
status when ready is false.
| const refreshInstances = useCallback(async () => { | ||
| try { | ||
| const resp = await fetch('/api/instances'); | ||
| if (!resp.ok) return; | ||
| const payload = await resp.json() as { instances?: DiscoveredInstance[] }; | ||
| const next = Array.isArray(payload.instances) ? payload.instances : []; | ||
| setInstances(next); | ||
|
|
||
| // Fall back to master if selected instance disappeared. | ||
| if (activeInstanceIdRef.current && !next.some((instance) => instance.id === activeInstanceIdRef.current)) { | ||
| setActiveInstanceId(null); | ||
| } | ||
| } catch { | ||
| // Keep stale values; do not fail UI flows on transient discovery issues. | ||
| } finally { | ||
| setInstancesLoading(false); | ||
| } | ||
| }, []); |
There was a problem hiding this comment.
Missing dependency in refreshInstances callback.
The refreshInstances callback uses setActiveInstanceId on line 54, but it's not included in the dependency array. While setActiveInstanceId has a stable identity (created with useCallback and empty deps), including it satisfies the exhaustive-deps rule and makes the dependency explicit.
🔧 Proposed fix
// Keep stale values; do not fail UI flows on transient discovery issues.
} finally {
setInstancesLoading(false);
}
- }, []);
+ }, [setActiveInstanceId]);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/contexts/InstanceContext.tsx` around lines 44 - 61, The refreshInstances
useCallback is missing setActiveInstanceId in its dependency array; update the
dependency list for the refreshInstances callback to include setActiveInstanceId
(alongside any existing deps) so the hook explicitly depends on the setter used
inside (references: refreshInstances, setActiveInstanceId, activeInstanceIdRef,
setInstances, setInstancesLoading).
# Conflicts: # server/lib/ws-proxy.ts # src/App.tsx # src/features/file-browser/FileTreePanel.tsx # src/features/file-browser/hooks/useFileTree.ts # src/features/sessions/SessionList.tsx
There was a problem hiding this comment.
Actionable comments posted: 4
♻️ Duplicate comments (1)
docs/API.md (1)
429-430:⚠️ Potential issue | 🟡 MinorProxy target port is still documented incorrectly for HTTP proxying.
At Line 430, describing
18789/tcpas the preferred proxy target points to the gateway port, not the instance HTTP API target used for proxied/api/*routes. This remains misleading for integrators.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/API.md` around lines 429 - 430, The docs incorrectly state that proxying should prefer container port 18789/tcp (which is the gateway), causing confusion for `/api/*` HTTP proxy routes; update the text to indicate that the proxy target should be the instance HTTP API port (the container port serving the instance API), and change the port-selection guidance to prefer the instance API port (e.g., the container port used by the instance HTTP API) and only fall back to other published TCP ports if that API port is unavailable; update the sentence referencing `18789/tcp` and the `/api/*` proxy behavior accordingly so integrators see the correct target for HTTP proxying.
🧹 Nitpick comments (2)
src/features/file-browser/FileTreePanel.tsx (1)
585-587: Debug status bar provides useful instance visibility.The abbreviated instance ID and request status help diagnose routing issues. Consider making this debug bar conditional (e.g., dev mode only) before shipping to production, or adding a user-toggle.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/features/file-browser/FileTreePanel.tsx` around lines 585 - 587, The debug status bar (the div rendering debug.lastPath/debug.instanceId/debug.ok in FileTreePanel.tsx) should be rendered conditionally so it doesn’t appear in production; update the component to only show this debug panel when a dev-mode flag or a user-toggle prop is enabled (e.g., check process.env.NODE_ENV === 'development' or a new prop like showDebug) and wire a simple toggle state or prop to control visibility; keep the existing content (debug.lastPath, debug.instanceId slice, debug.ok/status/error) but wrap it in a conditional render that respects the environment/prop so it can be enabled during development or by an explicit user toggle.src/features/sessions/SessionList.tsx (1)
277-279: Consider also enabling stop for 'restarting' containers.The current regex only matches
running. Per typical Docker semantics, containers inrestartingstate can also be stopped. Consider:-const running = /running/i.test(runtimeStatus); +const running = /running|restarting/i.test(runtimeStatus);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/features/sessions/SessionList.tsx` around lines 277 - 279, The code determines stoppable containers using the regex /running/i (runtimeStatus -> running), but that misses containers in the "restarting" state; update the logic in SessionList.tsx where runtimeStatus, status, and running are computed (the variables runtimeStatus, status, running) so that running becomes true for both "running" and "restarting" (e.g., replace /running/i with a regex matching both states like /(running|restarting)/i or equivalent), ensuring the UI/actions that use running will allow stopping restarting containers as well.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/API.md`:
- Around line 413-416: Docs and implementation disagree on the error code for
non-allowlisted keys: the API.md table uses "invalid_credential_key" while the
instances route returns "invalid_configuration_key"; update the API.md entry
(the table row for 400 errors) to use "invalid_configuration_key" and ensure any
examples/tests referencing "invalid_credential_key" are updated to
"invalid_configuration_key" so the documented contract matches the instances
route behavior.
In `@server/lib/ws-proxy.ts`:
- Around line 152-155: The client WebSocket close for missing instance token in
the getInstanceToken branch should use the same close code/semantics as other
instance resolution failures: update the clientWs.close call in the
tokenResult?.token check (the getInstanceToken handling) to use close code 1013
and the same reason text used elsewhere for "instance unavailable" so that
client retry/behavior matches the other failure paths.
In `@src/contexts/GatewayContext.tsx`:
- Around line 150-168: The effect in GatewayContext (using
previousInstanceIdRef, lastConnectRef, loadConfig, connectionState, disconnect,
connect) causes dual reconnections and double runs because connectionState is in
the dependency array and useWebSocket also reacts to activeInstanceId; remove
connectionState from the dependency array and instead read the current
connection state via a ref (e.g., connectionStateRef) inside the effect or let
useWebSocket own reconnection logic by removing the manual disconnect/connect
sequence here; update the effect to only respond to activeInstanceId changes,
consult lastConnectRef/loadConfig for credentials, and call either
disconnect+connect OR rely on useWebSocket’s reconnection path (not both) to
avoid racing reconnects.
In `@src/features/file-browser/hooks/useFileTree.ts`:
- Around line 98-102: fetchChildren is calling the hardcoded endpoint
`/api/files/tree` so the request always hits master; update fetchChildren to
build the routed path and headers using the routing helpers: call
routeApiPath('/api/files/tree', activeInstanceId) to get the correct URL and use
addInstanceHeaderToFetch(fetch, activeInstanceId) (or pass the returned headers)
when invoking fetch so requests obey INSTANCE_ROUTED_API_PREFIXES and follow the
currently selected instance; keep existing params (`?path=...&depth=1`) and
replace the direct requestPath and fetch call with these helpers inside the same
fetchChildren function.
---
Duplicate comments:
In `@docs/API.md`:
- Around line 429-430: The docs incorrectly state that proxying should prefer
container port 18789/tcp (which is the gateway), causing confusion for `/api/*`
HTTP proxy routes; update the text to indicate that the proxy target should be
the instance HTTP API port (the container port serving the instance API), and
change the port-selection guidance to prefer the instance API port (e.g., the
container port used by the instance HTTP API) and only fall back to other
published TCP ports if that API port is unavailable; update the sentence
referencing `18789/tcp` and the `/api/*` proxy behavior accordingly so
integrators see the correct target for HTTP proxying.
---
Nitpick comments:
In `@src/features/file-browser/FileTreePanel.tsx`:
- Around line 585-587: The debug status bar (the div rendering
debug.lastPath/debug.instanceId/debug.ok in FileTreePanel.tsx) should be
rendered conditionally so it doesn’t appear in production; update the component
to only show this debug panel when a dev-mode flag or a user-toggle prop is
enabled (e.g., check process.env.NODE_ENV === 'development' or a new prop like
showDebug) and wire a simple toggle state or prop to control visibility; keep
the existing content (debug.lastPath, debug.instanceId slice,
debug.ok/status/error) but wrap it in a conditional render that respects the
environment/prop so it can be enabled during development or by an explicit user
toggle.
In `@src/features/sessions/SessionList.tsx`:
- Around line 277-279: The code determines stoppable containers using the regex
/running/i (runtimeStatus -> running), but that misses containers in the
"restarting" state; update the logic in SessionList.tsx where runtimeStatus,
status, and running are computed (the variables runtimeStatus, status, running)
so that running becomes true for both "running" and "restarting" (e.g., replace
/running/i with a regex matching both states like /(running|restarting)/i or
equivalent), ensuring the UI/actions that use running will allow stopping
restarting containers as well.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 90d95b44-9fcc-4f0b-b3b2-103c82ef815b
📒 Files selected for processing (11)
README.mddocs/API.mdserver/app.tsserver/lib/ws-proxy.test.tsserver/lib/ws-proxy.tssrc/App.tsxsrc/contexts/GatewayContext.tsxsrc/features/file-browser/FileTreePanel.tsxsrc/features/file-browser/hooks/useFileTree.tssrc/features/sessions/SessionList.tsxsrc/hooks/useWebSocket.ts
🚧 Files skipped from review as they are similar to previous changes (2)
- server/lib/ws-proxy.test.ts
- README.md
| | 400 | `{ "error": { "code": "invalid_json", ... } }` | Body is not valid JSON | | ||
| | 400 | `{ "error": { "code": "invalid_request", ... } }` | Validation failed (name/type/keys) | | ||
| | 400 | `{ "error": { "code": "invalid_credential_key", ... } }` | Credential key not in allowlist | | ||
| | 409 | `{ "error": { "code": "instance_name_in_use", ... } }` | Container name already exists | |
There was a problem hiding this comment.
Documented error code for invalid create keys is inconsistent with the route.
At Line 415, docs say invalid_credential_key, but the implementation returns invalid_configuration_key in server/routes/instances.ts for non-allowlisted keys. Please align the contract to avoid client-side error handling bugs.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/API.md` around lines 413 - 416, Docs and implementation disagree on the
error code for non-allowlisted keys: the API.md table uses
"invalid_credential_key" while the instances route returns
"invalid_configuration_key"; update the API.md entry (the table row for 400
errors) to use "invalid_configuration_key" and ensure any examples/tests
referencing "invalid_credential_key" are updated to "invalid_configuration_key"
so the documented contract matches the instances route behavior.
| const tokenResult = await getInstanceToken(requestedInstanceId); | ||
| if (!tokenResult?.token) { | ||
| clientWs.close(1011, 'Instance token missing'); | ||
| return; |
There was a problem hiding this comment.
Use consistent “instance unavailable” close semantics for missing token.
Line 154 returns 1011 while other instance resolution failures use 1013 (Lines 143 and 148). Unifying this makes client behavior/retry handling consistent.
Suggested patch
- if (!tokenResult?.token) {
- clientWs.close(1011, 'Instance token missing');
+ if (!tokenResult?.token) {
+ clientWs.close(1013, 'Instance token unavailable');
return;
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const tokenResult = await getInstanceToken(requestedInstanceId); | |
| if (!tokenResult?.token) { | |
| clientWs.close(1011, 'Instance token missing'); | |
| return; | |
| const tokenResult = await getInstanceToken(requestedInstanceId); | |
| if (!tokenResult?.token) { | |
| clientWs.close(1013, 'Instance token unavailable'); | |
| return; | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@server/lib/ws-proxy.ts` around lines 152 - 155, The client WebSocket close
for missing instance token in the getInstanceToken branch should use the same
close code/semantics as other instance resolution failures: update the
clientWs.close call in the tokenResult?.token check (the getInstanceToken
handling) to use close code 1013 and the same reason text used elsewhere for
"instance unavailable" so that client retry/behavior matches the other failure
paths.
| // Keep the gateway stream aligned to the selected instance. | ||
| useEffect(() => { | ||
| const prev = previousInstanceIdRef.current; | ||
| if (prev === activeInstanceId) return; | ||
| previousInstanceIdRef.current = activeInstanceId; | ||
|
|
||
| const saved = lastConnectRef.current || (() => { | ||
| const loaded = loadConfig(); | ||
| if (loaded.url && loaded.token) return { url: loaded.url as string, token: loaded.token as string }; | ||
| return null; | ||
| })(); | ||
| if (!saved) return; | ||
|
|
||
| if (connectionState === 'connected' || connectionState === 'connecting' || connectionState === 'reconnecting') { | ||
| disconnect(); | ||
| // Use public connect wrapper so reconnect credentials/flags are reset consistently. | ||
| void connect(saved.url, saved.token).catch(() => {}); | ||
| } | ||
| }, [activeInstanceId, connectionState, disconnect, connect]); |
There was a problem hiding this comment.
Potential dual reconnection and effect re-triggering on instance switch.
Two concerns with this effect:
-
Dual reconnection sequences: When
activeInstanceIdchanges, this effect disconnects and reconnects. However,useWebSocket(line 51) also receivesactiveInstanceIdand rebuilds its proxy URL independently (seeuseWebSocket.tslines 142-146). This creates two independent reconnection paths that may race or conflict. -
connectionStatein dependency array: Afterdisconnect()on line 164,connectionStatetransitions to'disconnected'. SinceconnectionStateis a dependency, this triggers the effect again. Theprev === activeInstanceIdguard (line 153) prevents an infinite loop, but the effect still runs twice per instance switch — once for theactiveInstanceIdchange and once for theconnectionStatechange.
Consider removing connectionState from the dependency array and capturing the current state in a ref, or coordinating reconnection solely through useWebSocket to avoid dual paths.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/contexts/GatewayContext.tsx` around lines 150 - 168, The effect in
GatewayContext (using previousInstanceIdRef, lastConnectRef, loadConfig,
connectionState, disconnect, connect) causes dual reconnections and double runs
because connectionState is in the dependency array and useWebSocket also reacts
to activeInstanceId; remove connectionState from the dependency array and
instead read the current connection state via a ref (e.g., connectionStateRef)
inside the effect or let useWebSocket own reconnection logic by removing the
manual disconnect/connect sequence here; update the effect to only respond to
activeInstanceId changes, consult lastConnectRef/loadConfig for credentials, and
call either disconnect+connect OR rely on useWebSocket’s reconnection path (not
both) to avoid racing reconnects.
| const fetchChildren = useCallback(async (dirPath: string): Promise<TreeEntry[] | null> => { | ||
| try { | ||
| const params = dirPath ? `?path=${encodeURIComponent(dirPath)}&depth=1` : '?depth=1'; | ||
| const res = await fetch(`/api/files/tree${params}`); | ||
| const requestPath = `/api/files/tree${params}`; | ||
| const res = await fetch(requestPath); |
There was a problem hiding this comment.
Fetch calls are missing instance routing — file tree always queries master.
The fetchChildren function calls /api/files/tree directly without using routeApiPath or addInstanceHeaderToFetch from src/lib/apiRouting.ts. Since /api/files is in INSTANCE_ROUTED_API_PREFIXES, these calls should include the instanceId to route to the correct instance.
Currently, even when activeInstanceId changes, the file tree will always show the master's workspace.
🔧 Proposed fix using routeApiPath
+import { routeApiPath } from '@/lib/apiRouting';
const fetchChildren = useCallback(async (dirPath: string): Promise<TreeEntry[] | null> => {
try {
const params = dirPath ? `?path=${encodeURIComponent(dirPath)}&depth=1` : '?depth=1';
- const requestPath = `/api/files/tree${params}`;
+ const requestPath = routeApiPath(`/api/files/tree${params}`, activeInstanceId);
const res = await fetch(requestPath);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const fetchChildren = useCallback(async (dirPath: string): Promise<TreeEntry[] | null> => { | |
| try { | |
| const params = dirPath ? `?path=${encodeURIComponent(dirPath)}&depth=1` : '?depth=1'; | |
| const res = await fetch(`/api/files/tree${params}`); | |
| const requestPath = `/api/files/tree${params}`; | |
| const res = await fetch(requestPath); | |
| const fetchChildren = useCallback(async (dirPath: string): Promise<TreeEntry[] | null> => { | |
| try { | |
| const params = dirPath ? `?path=${encodeURIComponent(dirPath)}&depth=1` : '?depth=1'; | |
| const requestPath = routeApiPath(`/api/files/tree${params}`, activeInstanceId); | |
| const res = await fetch(requestPath); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/features/file-browser/hooks/useFileTree.ts` around lines 98 - 102,
fetchChildren is calling the hardcoded endpoint `/api/files/tree` so the request
always hits master; update fetchChildren to build the routed path and headers
using the routing helpers: call routeApiPath('/api/files/tree',
activeInstanceId) to get the correct URL and use addInstanceHeaderToFetch(fetch,
activeInstanceId) (or pass the returned headers) when invoking fetch so requests
obey INSTANCE_ROUTED_API_PREFIXES and follow the currently selected instance;
keep existing params (`?path=...&depth=1`) and replace the direct requestPath
and fetch call with these helpers inside the same fetchChildren function.
This draft PR is opened on behalf of @tinco.
Scope analyzed (branch point → now)
12cc93a(merge-base withorigin/master)+4886 / -86What this PR delivers
1) Multi-claw runtime image + local orchestration groundwork
Dockerfile.multiclawto run OpenClaw gateway + Nerve in a containerized multi-instance model.scripts/start-multiclaw.sh) and docs/research artifacts for the deployment model.2) Docker-backed instance lifecycle management (server)
Introduces a full local instance management stack:
server/lib/docker-instances.tsready|initializing|unreachable, gateway reachability).server/routes/instances.tsGET /api/instancesGET /api/instances/credentialsPOST /api/instancesPOST /api/instances/:id/stopDELETE /api/instances/:idGET /api/instances/:id/tokenALL /api/instances/:id/proxy(+ wildcard)3) Centralized instance-aware API routing + proxy hardening
server/lib/instance-routing.tsand middleware inserver/middleware/instance-routing.ts.server/lib/instance-proxy.tsadds guardrails:content-encodingforwarding to avoid client decode failures.4) WebSocket instance routing correctness
server/lib/ws-proxy.tsnow supports instance-specific WS targets by:5) Frontend multi-instance UX + request routing
src/contexts/InstanceContext.tsxto own discovery, selection, and global fetch wrapping.src/lib/apiRouting.ts:instanceIdquery param for routed API endpoints.X-Instance-Idheader only where applicable.cache: no-storefor instance-routed fetches to prevent stale cross-instance data.CreateInstanceDialogand expands session UI (SessionList) for create/select/manage flows.6) Health/readiness contract improvements
server/routes/health.tsnow exposes gateway probe state in:/health/healthcheckwith explicit readiness shape (ready, status, gateway).7) Documentation and project tracking
docs/API.md) and README.GOALS.md,PROGRESS.md) and research notes.Test coverage added/updated
server/lib/docker-instances.test.tsserver/routes/instances.test.tsserver/middleware/instance-routing.test.tssrc/lib/apiRouting.test.tsserver/lib/ws-proxy.test.tssrc/hooks/useWebSocket.test.tsBehavioral outcome
This PR evolves multi-claw from basic scaffolding into an end-to-end instance model where users can:
Risk notes / review focus
Summary by CodeRabbit
New Features
Documentation
Chores
Tests