fix(cli): restore logs streaming and reboot recovery ux#1187
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAsync registry-recovery seeded from onboard session and optional OpenShell probe; OpenShell semver parsing and logs compatibility gate; Changes
Sequence Diagram(s)sequenceDiagram
participant CLI as "nemoclaw CLI"
participant Reg as "Local Registry\n(sandboxes.json)"
participant Session as "onboard-session.json"
participant GW as "OpenShell (openshell)"
CLI->>Reg: Load registry
alt registry incomplete
CLI->>Session: Read last onboard session
Session-->>CLI: Return last onboarded sandbox info
CLI->>Reg: Seed missing sandbox metadata
CLI->>GW: Optional: `openshell sandbox list`
GW-->>CLI: Return live sandbox list
CLI->>Reg: Upsert gateway entries & update default
Reg-->>CLI: Persist recovered registry
end
CLI->>CLI: Parse `logs <name> --follow`
CLI->>CLI: Check OpenShell version (semver)
alt supports --tail
CLI->>GW: Run `openshell logs <name> --tail` (stream / spawn handling)
GW-->>CLI: Stream output / exit
else incompatible
CLI-->>User: Print compatibility guidance and exit(1)
end
CLI->>CLI: Pre-dispatch recovery for `connect <name>` if missing
CLI->>GW: `openshell sandbox connect <name>`
GW-->>CLI: Connection result
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
test/cli.test.js (1)
107-148:⚠️ Potential issue | 🟡 MinorAlso assert that
--follownever reaches OpenShell.This still passes for
openshell logs alpha --follow --tail, which is the exact compatibility regression this change is trying to prevent.🧪 Suggested assertion
const r = runWithEnv("alpha logs --follow", { HOME: home, PATH: `${localBin}:${process.env.PATH || ""}`, }); expect(r.code).toBe(0); - expect(fs.readFileSync(markerFile, "utf8")).toContain("logs alpha --tail"); + const recordedArgs = fs.readFileSync(markerFile, "utf8"); + expect(recordedArgs).toContain("logs alpha --tail"); + expect(recordedArgs).not.toContain("--follow"); });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/cli.test.js` around lines 107 - 148, Update the test "maps --follow to openshell --tail" to also assert that the spawned openshell never receives the original --follow flag: after reading markerFile (written by the fake openshell script) add an assertion such as expect(fs.readFileSync(markerFile, "utf8")).not.toContain("--follow") (or assert equality to the exact expected args string like "logs alpha --tail") so runWithEnv/openshell invocation does not include "--follow".
🧹 Nitpick comments (1)
test/cli.test.js (1)
354-528: Add a partial-registry recovery case.These tests only start from an empty
sandboxes.json. The current recovery bug shows up when one local sandbox already exists andlistis supposed to merge the session sandbox or extra live names, so this suite will not catch that regression yet.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/cli.test.js` around lines 354 - 528, Add a partial-registry recovery case by pre-populating .nemoclaw/sandboxes.json with an existing sandbox entry and asserting the list recovery merges new session/live sandboxes instead of overwriting; modify the two tests ("recovers a missing registry entry from the last onboard session during list" and "imports additional live sandboxes into the registry during list recovery") to create a sandboxes.json before running runWithEnv("list") (e.g., write JSON with sandboxes: { "gamma": { ... } } and defaultSandbox: "gamma"), then assert after the run that both the original "gamma" and the recovered "alpha" (and "beta" in the second test) exist and defaultSandbox remains correct or is updated as expected.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@bin/nemoclaw.js`:
- Around line 117-125: The recovered sandbox entries drop persisted policy
presets because buildRecoveredSandboxEntry only copies metadata.policies and
ignores metadata.policyPresets; update buildRecoveredSandboxEntry to populate
policies from metadata.policies if present, otherwise fall back to
metadata.policyPresets (and default to []), e.g. set policies =
Array.isArray(metadata.policies) ? metadata.policies :
(Array.isArray(metadata.policyPresets) ? metadata.policyPresets : []); make the
same change in the analogous recovered-entry builder used for agents/recovered
sessions so both sources of presets are preserved.
- Around line 138-146: The current shouldRecoverRegistryEntries incorrectly
prevents recovery when the registry contains any entries even if those entries
are stale or a session/requested sandbox should be re-seeded; update the logic
in shouldRecoverRegistryEntries so that hasRecoverySeed is true when there is a
session.sandboxName or a requestedSandboxName (not just when
current.sandboxes.length > 0), and compute shouldRecover as true whenever the
registry is empty OR the requested sandbox is missing OR a session/requested
sandbox is present that isn’t represented in current.sandboxes; change the
definitions around missingRequestedSandbox, hasRecoverySeed and shouldRecover in
the shouldRecoverRegistryEntries function to reflect this (use
current.sandboxes.some(...) to detect presence and trigger recovery when that
check fails even if sandboxes.length > 0).
- Around line 854-880: The current spawnSync result handling in the logs command
block treats a SIGINT interrupt as a failure; modify the error/exit path to call
the existing exitWithSpawnResult(result) helper (used elsewhere) instead of
directly printing "Command failed" and process.exit when result.status is null
or result.signal is set, so signals are converted to proper exit codes; keep the
existing compatibility check
(printOldLogsCompatibilityGuidance(installedVersion) when the combined output
matches the old-logs patterns or version is too old) but invoke
exitWithSpawnResult(result) for non-zero/null status handling to mirror the
other call sites.
---
Outside diff comments:
In `@test/cli.test.js`:
- Around line 107-148: Update the test "maps --follow to openshell --tail" to
also assert that the spawned openshell never receives the original --follow
flag: after reading markerFile (written by the fake openshell script) add an
assertion such as expect(fs.readFileSync(markerFile,
"utf8")).not.toContain("--follow") (or assert equality to the exact expected
args string like "logs alpha --tail") so runWithEnv/openshell invocation does
not include "--follow".
---
Nitpick comments:
In `@test/cli.test.js`:
- Around line 354-528: Add a partial-registry recovery case by pre-populating
.nemoclaw/sandboxes.json with an existing sandbox entry and asserting the list
recovery merges new session/live sandboxes instead of overwriting; modify the
two tests ("recovers a missing registry entry from the last onboard session
during list" and "imports additional live sandboxes into the registry during
list recovery") to create a sandboxes.json before running runWithEnv("list")
(e.g., write JSON with sandboxes: { "gamma": { ... } } and defaultSandbox:
"gamma"), then assert after the run that both the original "gamma" and the
recovered "alpha" (and "beta" in the second test) exist and defaultSandbox
remains correct or is updated as expected.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: bbf043a1-fc52-4733-96ff-8202a0be56a7
📒 Files selected for processing (2)
bin/nemoclaw.jstest/cli.test.js
| cwd: ROOT, | ||
| env: process.env, | ||
| encoding: "utf-8", | ||
| stdio: follow ? ["ignore", "inherit", "pipe"] : ["ignore", "pipe", "pipe"], |
There was a problem hiding this comment.
Nice catch on the --follow → --tail translation — this clearly fixes the regression.
One thing I wanted to think through with you on the stdio configuration: in follow mode, stderr is piped (["ignore", "inherit", "pipe"]), which means any errors openshell writes during a long-running streaming session won't be visible to the user until after they Ctrl-C. Could that mask a mid-stream error in a way that would be confusing? Would it be worth inheriting stderr too and handling the version-compatibility detection a different way (e.g., checking the version upfront only, which you already do)?
There was a problem hiding this comment.
Good question. I kept stderr piped in follow mode for this PR because the compatibility path still needs to inspect parser failures from older OpenShell builds, and those show up on stderr before a stream is established. Since we now check the installed version up front as well, I agree the remaining tradeoff is mid-stream stderr visibility versus preserving that fallback detection. I chose to keep the existing detection path stable here and would treat inheriting stderr in follow mode as a follow-up cleanup if we want to simplify around the version gate alone.
| }; | ||
| } | ||
|
|
||
| function upsertRecoveredSandbox(name, metadata = {}) { |
There was a problem hiding this comment.
I notice resolveReconnectSandboxName runs validateName() on sandbox names that come from user input — nice. Looking at upsertRecoveredSandbox, though, names sourced from onboard-session.json or the gateway sandbox list output flow straight to registry.registerSandbox() without the same validation. Is there a reason to skip validation here, or would it be worth being consistent across both paths?
There was a problem hiding this comment.
That is a fair consistency question. I originally skipped validation there because those names come from NemoClaw/OpenShell-managed sources rather than raw user input, and the gateway path is already constrained by OpenShell sandbox naming rules. I do agree that validating before registry writes would be a reasonable hardening step; I just did not want to expand this PR again after the UX direction changed to remove reconnect and move recovery onto connect.
|
|
||
| function listSandboxes() { | ||
| const { sandboxes, defaultSandbox } = registry.listSandboxes(); | ||
| async function listSandboxes() { |
There was a problem hiding this comment.
This is a thoughtful approach to the reboot problem — recovering the registry transparently during list so the user doesn't have to know about it.
One behavioral shift worth calling out: list was previously a pure read; now it can write to sandboxes.json as a side effect. Two questions come to mind:
- If someone runs
nemoclaw listwhilenemoclaw onboardis in progress on another terminal, could both be writing tosandboxes.jsonconcurrently? How does the registry lock handle that? - Would it be useful to mention the write-on-read behavior in the help text or a doc note, so users who script around
listaren't surprised?
There was a problem hiding this comment.
I checked the registry implementation before answering this. The write path is serialized through the existing advisory lock in bin/lib/registry.js (mkdir lock dir + owner PID + stale-lock handling) and the file write itself is tmp-file-plus-rename, so list and onboard should serialize rather than clobber each other. I also updated the PR body to call out explicitly that list now repairs local inventory on read, since that side effect is worth documenting.
bin/nemoclaw.js
Outdated
| exitWithSpawnResult(result); | ||
| } | ||
|
|
||
| async function reconnect(args = []) { |
There was a problem hiding this comment.
The reconnect flow is clean — I like that it reuses sandboxConnect rather than inventing a new connection path.
Thinking through the call chain: reconnect calls recoverRegistryEntries() which may probe the gateway (via recoverRegistryFromLiveGateway → recoverNamedGatewayRuntime), then calls sandboxConnect() → ensureLiveSandboxOrExit() which can call recoverNamedGatewayRuntime() again. In the worst case, is that a double gateway probe? If so, is the cost low enough not to worry about, or would it be worth short-circuiting the second one if recovery already succeeded?
There was a problem hiding this comment.
I agreed with the UX concern behind this thread and changed the shape after that review pass: the top-level reconnect command is gone now, and recovery moved onto nemoclaw <name> connect. So this specific double-probe path is no longer part of the PR. The updated flow is: dispatch sees <name> connect, attempts recovery for that requested sandbox, and if recovery succeeds it drops straight into the existing sandboxConnect() path.
|
Updated this PR to follow the product direction from the review discussion:
I also refreshed the PR body to match the new shape and replaced the old validation notes with the current
For the cross-platform environment checks, I validated the recovery path directly: |
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
bin/nemoclaw.js (1)
132-139:⚠️ Potential issue | 🟠 MajorValidate recovered sandbox names before persisting them.
Line 132 upserts names from recovery sources without the same validation used for user input paths. This can persist invalid names into
sandboxes.jsonand create follow-on failures in command flows that assume validated names.🛠️ Suggested hardening
+function isValidRecoveredSandboxName(name) { + return typeof name === "string" && /^[a-z0-9][a-z0-9-]*[a-z0-9]$/.test(name); +} + function upsertRecoveredSandbox(name, metadata = {}) { + if (!isValidRecoveredSandboxName(name)) { + return false; + } const entry = buildRecoveredSandboxEntry(name, metadata); if (registry.getSandbox(name)) { registry.updateSandbox(name, entry); return false; } registry.registerSandbox(entry); return true; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bin/nemoclaw.js` around lines 132 - 139, upsertRecoveredSandbox currently persists recovered names without validation; before calling buildRecoveredSandboxEntry or registry.registerSandbox/updateSandbox, run the recovered name through the same sandbox name validation/sanitization used for user-entered paths (e.g. the project's validate/sanitize function for sandbox names) and skip or normalize any invalid names; specifically, validate the name variable at the top of upsertRecoveredSandbox, return false or log and skip when invalid, and only call buildRecoveredSandboxEntry, registry.updateSandbox or registry.registerSandbox for names that pass validation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@bin/nemoclaw.js`:
- Around line 117-129: buildRecoveredSandboxEntry correctly includes
nimContainer in the recovered entry, but registerSandbox in bin/lib/registry.js
does not persist nimContainer, so recovered entries lose their container
linkage; update the registerSandbox(sandbox) implementation to accept and store
sandbox.nimContainer (alongside name, model, provider, gpuEnabled, policies)
when creating/persisting the sandbox record and ensure any normalization/DB
write paths (e.g., the code paths that previously handled
model/provider/policies) include nimContainer so the value from
buildRecoveredSandboxEntry is retained after registry.registerSandbox(entry) is
called.
---
Duplicate comments:
In `@bin/nemoclaw.js`:
- Around line 132-139: upsertRecoveredSandbox currently persists recovered names
without validation; before calling buildRecoveredSandboxEntry or
registry.registerSandbox/updateSandbox, run the recovered name through the same
sandbox name validation/sanitization used for user-entered paths (e.g. the
project's validate/sanitize function for sandbox names) and skip or normalize
any invalid names; specifically, validate the name variable at the top of
upsertRecoveredSandbox, return false or log and skip when invalid, and only call
buildRecoveredSandboxEntry, registry.updateSandbox or registry.registerSandbox
for names that pass validation.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 6e4c2dbb-46da-4781-bbcf-703787e06543
📒 Files selected for processing (2)
bin/nemoclaw.jstest/cli.test.js
| function buildRecoveredSandboxEntry(name, metadata = {}) { | ||
| return { | ||
| name, | ||
| model: metadata.model || null, | ||
| provider: metadata.provider || null, | ||
| gpuEnabled: metadata.gpuEnabled === true, | ||
| policies: Array.isArray(metadata.policies) | ||
| ? metadata.policies | ||
| : Array.isArray(metadata.policyPresets) | ||
| ? metadata.policyPresets | ||
| : [], | ||
| nimContainer: metadata.nimContainer || null, | ||
| }; |
There was a problem hiding this comment.
Recovered nimContainer metadata is still lost on first insert.
Line 128 includes nimContainer, but Line 138 inserts through registry.registerSandbox(entry), and bin/lib/registry.js currently does not persist nimContainer in registerSandbox(). Result: newly recovered entries lose container linkage immediately.
🧩 Cross-file fix (`bin/lib/registry.js`)
function registerSandbox(entry) {
return withLock(() => {
const data = load();
const normalized = normalizeEntry(entry);
if (!normalized) {
throw new Error(`Invalid sandbox entry: ${entry.name}`);
}
data.sandboxes[normalized.name] = {
name: normalized.name,
model: normalized.model,
provider: normalized.provider,
gpuEnabled: normalized.gpuEnabled,
policies: normalized.policies || [],
+ nimContainer: normalized.nimContainer,
};
if (!data.defaultSandbox) {
data.defaultSandbox = entry.name;
}
save(data);
});
}Also applies to: 132-139
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@bin/nemoclaw.js` around lines 117 - 129, buildRecoveredSandboxEntry correctly
includes nimContainer in the recovered entry, but registerSandbox in
bin/lib/registry.js does not persist nimContainer, so recovered entries lose
their container linkage; update the registerSandbox(sandbox) implementation to
accept and store sandbox.nimContainer (alongside name, model, provider,
gpuEnabled, policies) when creating/persisting the sandbox record and ensure any
normalization/DB write paths (e.g., the code paths that previously handled
model/provider/policies) include nimContainer so the value from
buildRecoveredSandboxEntry is retained after registry.registerSandbox(entry) is
called.
ericksoa
left a comment
There was a problem hiding this comment.
Looks good — all prior comments addressed. The recovery pipeline is well-factored, test coverage is solid (10 new tests including partial-registry cases), and the connect-based recovery is a cleaner UX than a separate reconnect command.
Two minor follow-ups for a future PR, neither blocking:
- Inheriting stderr in follow mode once the version gate is the sole compat check
- Validating recovered sandbox names before registry writes for consistency
Approving.
|
Thanks, Aaron. I'll merge once all CI goes green |
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
bin/nemoclaw.js (1)
149-163:⚠️ Potential issue | 🟠 MajorPartial registries can still skip the live-gateway rebuild.
Line 160 only enables recovery when the registry is empty, the requested sandbox is missing, or the session sandbox is missing. If
sandboxes.jsonalready contains the session sandbox but is still missing other live sandboxes, Line 718 never reachesrecoverRegistryFromLiveGateway(), sonemoclaw liststays incomplete after reboot.🛠️ Suggested direction
-function shouldRecoverRegistryEntries(current, session, requestedSandboxName) { +function shouldRecoverRegistryEntries(current, session, requestedSandboxName, { probeLiveGateway = false } = {}) { const hasSessionSandbox = Boolean(session?.sandboxName); const missingSessionSandbox = hasSessionSandbox && !current.sandboxes.some((sandbox) => sandbox.name === session.sandboxName); const missingRequestedSandbox = Boolean(requestedSandboxName) && !current.sandboxes.some((sandbox) => sandbox.name === requestedSandboxName); const hasRecoverySeed = current.sandboxes.length > 0 || hasSessionSandbox || Boolean(requestedSandboxName); return { missingRequestedSandbox, shouldRecover: hasRecoverySeed && - (current.sandboxes.length === 0 || missingRequestedSandbox || missingSessionSandbox), + (current.sandboxes.length === 0 || missingRequestedSandbox || missingSessionSandbox || probeLiveGateway), }; } -async function recoverRegistryEntries({ requestedSandboxName = null } = {}) { +async function recoverRegistryEntries({ requestedSandboxName = null, probeLiveGateway = false } = {}) { const current = registry.listSandboxes(); const session = onboardSession.loadSession(); - const recoveryCheck = shouldRecoverRegistryEntries(current, session, requestedSandboxName); + const recoveryCheck = shouldRecoverRegistryEntries(current, session, requestedSandboxName, { probeLiveGateway }); if (!recoveryCheck.shouldRecover) { return { ...current, recoveredFromSession: false, recoveredFromGateway: 0 }; } const seeded = seedRecoveryMetadata(current, session, requestedSandboxName); - const shouldProbeLiveGateway = current.sandboxes.length > 0 || Boolean(session?.sandboxName); + const shouldProbeLiveGateway = probeLiveGateway || current.sandboxes.length > 0 || Boolean(session?.sandboxName);- const recovery = await recoverRegistryEntries(); + const recovery = await recoverRegistryEntries({ probeLiveGateway: true });Also applies to: 226-238, 717-719
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bin/nemoclaw.js` around lines 149 - 163, The current shouldRecoverRegistryEntries logic misses the case of a partial registry (registry contains some sandboxes but is missing other live sandboxes), so change the function signature shouldRecoverRegistryEntries(current, session, requestedSandboxName, hasMissingLiveSandboxes=false) and include hasMissingLiveSandboxes in the shouldRecover condition (i.e., set shouldRecover to true when hasMissingLiveSandboxes is true OR when current.sandboxes.length === 0 OR missingRequestedSandbox OR missingSessionSandbox). Update all call sites (the code around the uses at the blocks referenced near the original diffs and where recoverRegistryFromLiveGateway() is invoked) to compute/pass hasMissingLiveSandboxes (by comparing current.sandboxes to the live-gateway list before calling this helper) so recoverRegistryFromLiveGateway() runs for partial registries as well.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@bin/nemoclaw.js`:
- Around line 217-223: The recovery logic in applyRecoveredDefault currently
prefers requestedSandboxName even when a currentDefaultSandbox exists, causing
silent global default changes; change it so the default is only set when there
is no currentDefaultSandbox (i.e., only use requestedSandboxName or
session.sandboxName when currentDefaultSandbox is falsy), then call
registry.setDefault(preferredDefault) only if preferredDefault is truthy and
exists in recovered.sandboxes; update the same pattern at the other occurrence
(around lines 1052-1057) to avoid silently retargeting the global default during
connect recovery.
---
Duplicate comments:
In `@bin/nemoclaw.js`:
- Around line 149-163: The current shouldRecoverRegistryEntries logic misses the
case of a partial registry (registry contains some sandboxes but is missing
other live sandboxes), so change the function signature
shouldRecoverRegistryEntries(current, session, requestedSandboxName,
hasMissingLiveSandboxes=false) and include hasMissingLiveSandboxes in the
shouldRecover condition (i.e., set shouldRecover to true when
hasMissingLiveSandboxes is true OR when current.sandboxes.length === 0 OR
missingRequestedSandbox OR missingSessionSandbox). Update all call sites (the
code around the uses at the blocks referenced near the original diffs and where
recoverRegistryFromLiveGateway() is invoked) to compute/pass
hasMissingLiveSandboxes (by comparing current.sandboxes to the live-gateway list
before calling this helper) so recoverRegistryFromLiveGateway() runs for partial
registries as well.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: efe6fa8b-b153-4fb7-ab88-c4aac028fe75
📒 Files selected for processing (2)
bin/nemoclaw.jstest/cli.test.js
| metadataByName.set( | ||
| session.sandboxName, | ||
| buildRecoveredSandboxEntry(session.sandboxName, { | ||
| model: session.model || null, | ||
| provider: session.provider || null, | ||
| nimContainer: session.nimContainer || null, | ||
| policyPresets: session.policyPresets || null, | ||
| }) | ||
| ); | ||
| const sessionSandboxMissing = !current.sandboxes.some((sandbox) => sandbox.name === session.sandboxName); | ||
| const shouldRecoverSessionSandbox = | ||
| current.sandboxes.length === 0 || sessionSandboxMissing || requestedSandboxName === session.sandboxName; | ||
| if (shouldRecoverSessionSandbox) { | ||
| recoveredFromSession = upsertRecoveredSandbox(session.sandboxName, metadataByName.get(session.sandboxName)); | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
REGISTRY_FILE="$(fd '^registry\.js$' bin | head -n1)"
echo "Inspecting: $REGISTRY_FILE"
rg -n -C6 'function normalizeEntry|function updateSandbox|gpuEnabled|policies|nimContainer' "$REGISTRY_FILE"Repository: NVIDIA/NemoClaw
Length of output: 929
🏁 Script executed:
cat -n bin/lib/registry.js | sed -n '165,200p'Repository: NVIDIA/NemoClaw
Length of output: 1211
🏁 Script executed:
cat -n bin/nemoclaw.js | sed -n '174,215p'Repository: NVIDIA/NemoClaw
Length of output: 1878
🏁 Script executed:
rg -n -A15 'function buildRecoveredSandboxEntry|function upsertRecoveredSandbox' bin/nemoclaw.jsRepository: NVIDIA/NemoClaw
Length of output: 999
Session recovery can reset critical sandbox fields when updating existing sandboxes.
Line 142 in bin/lib/registry.js calls updateSandbox() with an entry created by buildRecoveredSandboxEntry(). Although updateSandbox() uses Object.assign() (a merge operation at line 172), the recovered entry is built from session snapshots that don't include all original fields. Specifically:
buildRecoveredSandboxEntry()defaultsgpuEnabledtofalse(line 122) if missing from metadata- It defaults
policiesto[](line 123–127) if missing from metadata
When seedRecoveryMetadata() processes a session at lines 174–182 and later passes it through upsertRecoveredSandbox(), these incomplete snapshots will overwrite existing sandbox fields with defaults, silently resetting gpuEnabled, policies, or nimContainer on already-registered sandboxes.
Preserve existing fields by not overwriting entries in metadataByName when they already exist, or ensure session snapshots only update fields that were actually saved in the session.
| function applyRecoveredDefault(currentDefaultSandbox, requestedSandboxName, session) { | ||
| const recovered = registry.listSandboxes(); | ||
| const preferredDefault = requestedSandboxName || (!currentDefaultSandbox ? session?.sandboxName || null : null); | ||
| if (preferredDefault && recovered.sandboxes.some((sandbox) => sandbox.name === preferredDefault)) { | ||
| registry.setDefault(preferredDefault); | ||
| } | ||
| return registry.listSandboxes(); |
There was a problem hiding this comment.
connect recovery shouldn't silently retarget the default sandbox.
Line 219 prefers requestedSandboxName even when currentDefaultSandbox is already set. That means beta connect can repair the registry and also change later global commands to operate on beta, which is a surprising side effect for a recovery path.
🛠️ Suggested fix
function applyRecoveredDefault(currentDefaultSandbox, requestedSandboxName, session) {
const recovered = registry.listSandboxes();
- const preferredDefault = requestedSandboxName || (!currentDefaultSandbox ? session?.sandboxName || null : null);
+ const preferredDefault = currentDefaultSandbox
+ ? null
+ : requestedSandboxName || session?.sandboxName || null;
if (preferredDefault && recovered.sandboxes.some((sandbox) => sandbox.name === preferredDefault)) {
registry.setDefault(preferredDefault);
}
return registry.listSandboxes();
}Also applies to: 1052-1057
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@bin/nemoclaw.js` around lines 217 - 223, The recovery logic in
applyRecoveredDefault currently prefers requestedSandboxName even when a
currentDefaultSandbox exists, causing silent global default changes; change it
so the default is only set when there is no currentDefaultSandbox (i.e., only
use requestedSandboxName or session.sandboxName when currentDefaultSandbox is
falsy), then call registry.setDefault(preferredDefault) only if preferredDefault
is truthy and exists in recovered.sandboxes; update the same pattern at the
other occurrence (around lines 1052-1057) to avoid silently retargeting the
global default during connect recovery.
There was a problem hiding this comment.
🧹 Nitpick comments (1)
test/cli.test.js (1)
349-845: Consider extracting shared test fixtures to reduce duplication.The repeated onboard-session payloads and openshell shim scaffolding make this section expensive to maintain; a few helpers (e.g.,
writeRegistry,writeSession,writeOpenshellShim) would reduce drift risk.♻️ Suggested refactor sketch
+function writeRegistry(nemoclawDir, sandboxes, defaultSandbox) { + fs.writeFileSync( + path.join(nemoclawDir, "sandboxes.json"), + JSON.stringify({ sandboxes, defaultSandbox }), + { mode: 0o600 } + ); +} + +function writeSession(nemoclawDir, overrides = {}) { + const base = { + version: 1, + sessionId: "session-1", + resumable: true, + status: "complete", + mode: "interactive", + startedAt: "2026-03-31T00:00:00.000Z", + updatedAt: "2026-03-31T00:00:00.000Z", + sandboxName: "alpha", + provider: "nvidia-prod", + model: "nvidia/nemotron-3-super-120b-a12b", + policyPresets: ["pypi"], + steps: {}, + }; + fs.writeFileSync( + path.join(nemoclawDir, "onboard-session.json"), + JSON.stringify({ ...base, ...overrides }, null, 2), + { mode: 0o600 } + ); +}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/cli.test.js` around lines 349 - 845, Extract the repeated test setup into helpers to reduce duplication: create functions like writeRegistry(nemoclawDir, data) to write sandboxes.json, writeSession(nemoclawDir, sessionPayload) to write onboard-session.json, and writeOpenshellShim(localBin, scriptVariants) to create the openshell shim; update tests that call runWithEnv to invoke these helpers (replace the inline fs.writeFileSync blocks and repeated JSON payloads) so each test only specifies the varying parts (e.g., sandbox names, sandbox list output lines, markerFile) while the helper handles file creation, modes, and common fields.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@test/cli.test.js`:
- Around line 349-845: Extract the repeated test setup into helpers to reduce
duplication: create functions like writeRegistry(nemoclawDir, data) to write
sandboxes.json, writeSession(nemoclawDir, sessionPayload) to write
onboard-session.json, and writeOpenshellShim(localBin, scriptVariants) to create
the openshell shim; update tests that call runWithEnv to invoke these helpers
(replace the inline fs.writeFileSync blocks and repeated JSON payloads) so each
test only specifies the varying parts (e.g., sandbox names, sandbox list output
lines, markerFile) while the helper handles file creation, modes, and common
fields.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 4bf4b20a-3871-4463-a533-c2adbc67ee8f
📒 Files selected for processing (1)
test/cli.test.js
Summary
nemoclaw <name> logs --followby mapping the user-facing--followflag to the OpenShell--tailflagnemoclaw <sandbox-name> connectrecover missing local registry state directly instead of introducing a separate top-levelreconnectcommandThis absorbs the useful parts of
#1148and the recovery ideas from#960, but keeps the product surface on the command users already expect to reach for after a reboot:nemoclaw <name> connect.Issues
Fixes #1146
Fixes #1165
Addresses #990
Addresses #1154
Why this shape
#1148is the right fix for the logs regression, but too narrow by itselfnemoclaw <name> connectfirst, so recovery belongs on that path rather than behind a new top-level commandlistnow repairs stale or missing local sandbox inventory so the user sees recoverable state instead of an empty registry after rebootChanges
Logs
nemoclaw logs --followtoopenshell logs --tail--followas the NemoClaw user-facing flaglogs --followReboot recovery
~/.nemoclaw/onboard-session.jsonwhen local registry is missing or stalebin/lib/registry.jsConnect UX
nemoclaw reconnectcommandnemoclaw <sandbox-name> connectattempt recovery before falling into the unknown-command pathsandboxConnect()flow after recovery instead of introducing a second connection pathCross-platform validation
mac
fix/logs-follow-reconnect-reboot/5191d0d/Users/kejones/Git/nvidia/NemoClaw-2/node_modules/.bin/vitest run test/cli.test.js -t "connect"6connect-focused tests passed,26skipped)brev-cpu (
kj-nemoclaw-cpu-test)origin/fix/logs-follow-reconnect-reboot/5191d0d/home/ubuntu/.nvm/versions/node/v22.22.2/bin/nodeopenshellshim and last-session recovery seednemoclaw alpha connectrecovered from missing local registry stateopenshell sandbox listopenshell sandbox get alphaopenshell sandbox connect alphabrev-gpu (
kj-nemoclaw-l40s-test)origin/fix/logs-follow-reconnect-reboot/5191d0d/home/ubuntu/.nvm/versions/node/v22.22.1/bin/nodeopenshellshim and last-session recovery seednemoclaw alpha connectrecovered from missing local registry stateopenshell sandbox listopenshell sandbox get alphaopenshell sandbox connect alphaspark (
spark-d8c8)origin/fix/logs-follow-reconnect-reboot/5191d0dv22.22.1openshellshim and last-session recovery seednemoclaw alpha connectrecovered from missing local registry stateopenshell sandbox listopenshell sandbox get alphaopenshell sandbox connect alphaCredits
This branch pulls forward ideas from earlier good-faith work, but reworks them onto the current codebase and recovery model:
#1148bysenthilr-nv--follow->--taillogs regression#960byWuKongAI-CMU(Peter)connectNotes
#1148directly; absorbed its fix and test intent herereconnectcommand from#960; the final UX puts recovery onnemoclaw <name> connectSigned-off-by: Kevin Jones kejones@nvidia.com
Summary by CodeRabbit
New Features
Bug Fixes
Tests