Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions bin/lib/onboard.js
Original file line number Diff line number Diff line change
Expand Up @@ -1616,6 +1616,24 @@ async function startGatewayWithOptions(_gpu, { exitOnFailure = true } = {}) {
runOpenshell(["gateway", "destroy", "-g", GATEWAY_NAME], { ignoreError: true });
}

// Clear stale SSH host keys from previous gateway (fixes #768)
try {
const { execFileSync } = require("child_process");
execFileSync("ssh-keygen", ["-R", `openshell-${GATEWAY_NAME}`], { stdio: "ignore" });
} catch {}
// Also purge any known_hosts entries matching the gateway hostname pattern
const knownHostsPath = path.join(os.homedir(), ".ssh", "known_hosts");
if (fs.existsSync(knownHostsPath)) {
try {
const kh = fs.readFileSync(knownHostsPath, "utf8");
const cleaned = kh.split("\n").filter(l => {
const host = l.split(/\s/)[0] || "";
return !host.includes("openshell-");
}).join("\n");
if (cleaned !== kh) fs.writeFileSync(knownHostsPath, cleaned);
} catch {}
}

const gwArgs = ["--name", GATEWAY_NAME];
// Do NOT pass --gpu here. On DGX Spark (and most GPU hosts), inference is
// routed through a host-side provider (Ollama, vLLM, or cloud API) — the
Expand Down
4 changes: 3 additions & 1 deletion scripts/debug.sh
Original file line number Diff line number Diff line change
Expand Up @@ -284,7 +284,9 @@ if command -v openshell &>/dev/null \
SANDBOX_SSH_CONFIG=$(mktemp "${TMPDIR_BASE}/nemoclaw-ssh-XXXXXX")
if openshell sandbox ssh-config "$SANDBOX_NAME" >"$SANDBOX_SSH_CONFIG" 2>/dev/null; then
SANDBOX_SSH_HOST="openshell-${SANDBOX_NAME}"
SANDBOX_SSH_OPTS=(-F "$SANDBOX_SSH_CONFIG" -o StrictHostKeyChecking=no -o ConnectTimeout=10)
SANDBOX_SSH_KNOWN=$(mktemp "${TMPDIR_BASE}/nemoclaw-ssh-known-XXXXXX")
trap 'rm -f "$SANDBOX_SSH_KNOWN"' EXIT
SANDBOX_SSH_OPTS=(-F "$SANDBOX_SSH_CONFIG" -o StrictHostKeyChecking=accept-new -o "UserKnownHostsFile=$SANDBOX_SSH_KNOWN" -o ConnectTimeout=10)
Comment on lines +287 to +289
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

printf '\n== trap declarations in scripts/debug.sh ==\n'
nl -ba scripts/debug.sh | sed -n '84,95p;284,290p'

printf '\n== bash trap behavior repro ==\n'
bash -lc '
cleanup(){ echo cleanup; }
trap cleanup EXIT
trap '\''echo sandbox_known'\'' EXIT
'

Repository: NVIDIA/NemoClaw

Length of output: 163


🏁 Script executed:

#!/bin/bash

# First, check file size to ensure we can read it safely
wc -l scripts/debug.sh

# Read the cleanup function and first trap declaration
echo "=== Lines 84-95 (cleanup function area) ==="
sed -n '84,95p' scripts/debug.sh

# Read the SANDBOX_SSH_KNOWN and second trap declaration
echo ""
echo "=== Lines 284-290 (SANDBOX_SSH_KNOWN area) ==="
sed -n '284,290p' scripts/debug.sh

# Test bash trap behavior with available tools
echo ""
echo "=== Bash trap behavior test ==="
bash << 'BASHEOF'
cleanup(){ echo "cleanup called"; }
trap cleanup EXIT
trap 'echo sandbox_known' EXIT
echo "Exiting..."
BASHEOF

Repository: NVIDIA/NemoClaw

Length of output: 955


Don't replace the existing EXIT trap here.

Line 288 overwrites trap cleanup EXIT (line 94), preventing the cleanup function from running on exit. This leaves COLLECT_DIR and SANDBOX_SSH_CONFIG behind in ${TMPDIR_BASE} on every successful run.

Fold the cleanup of SANDBOX_SSH_KNOWN into the cleanup() function instead of creating a separate trap:

Fix
 TMPDIR_BASE="${TMPDIR:-/tmp}"
 COLLECT_DIR=$(mktemp -d "${TMPDIR_BASE}/nemoclaw-debug-XXXXXX")
 SANDBOX_SSH_CONFIG=""
+SANDBOX_SSH_KNOWN=""
 cleanup() {
   rm -rf "$COLLECT_DIR"
   if [ -n "$SANDBOX_SSH_CONFIG" ]; then
     rm -f "$SANDBOX_SSH_CONFIG"
   fi
+  if [ -n "$SANDBOX_SSH_KNOWN" ]; then
+    rm -f "$SANDBOX_SSH_KNOWN"
+  fi
 }
 trap cleanup EXIT

Then at line 287-288, remove the second trap:

     SANDBOX_SSH_KNOWN=$(mktemp "${TMPDIR_BASE}/nemoclaw-ssh-known-XXXXXX")
-    trap 'rm -f "$SANDBOX_SSH_KNOWN"' EXIT
     SANDBOX_SSH_OPTS=(-F "$SANDBOX_SSH_CONFIG" -o StrictHostKeyChecking=accept-new -o "UserKnownHostsFile=$SANDBOX_SSH_KNOWN" -o ConnectTimeout=10)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/debug.sh` around lines 287 - 289, The new trap on EXIT overwrites the
existing trap and prevents cleanup() from running; remove the separate trap
command and instead add logic to the cleanup() function to remove the temporary
SANDBOX_SSH_KNOWN file (created as SANDBOX_SSH_KNOWN=$(mktemp ...)) and any
SANDBOX_SSH_CONFIG/COLLECT_DIR artifacts under TMPDIR_BASE so the single trap
'cleanup EXIT' (from earlier) handles all teardown; update cleanup() to test for
and rm -f "$SANDBOX_SSH_KNOWN" and ensure SANDBOX_SSH_KNOWN is set in scope when
cleanup() runs.


collect "sandbox-ps" ssh "${SANDBOX_SSH_OPTS[@]}" "$SANDBOX_SSH_HOST" ps -ef
collect "sandbox-free" ssh "${SANDBOX_SSH_OPTS[@]}" "$SANDBOX_SSH_HOST" free -m
Expand Down