fix(security): use argv arrays for docker and curl to prevent shell injection#171
fix(security): use argv arrays for docker and curl to prevent shell injection#171dumko2001 wants to merge 1 commit intoNVIDIA:mainfrom
Conversation
Comprehensive fix for shell injection vulnerabilities where user input (instance names, sandbox names, model names, API keys) was interpolated unsanitized into shell commands via run()/runInteractive()/execSync(). Changes: - Add shellQuote() and validateName() to runner.js as shared utilities - Replace all execSync() with execFileSync() in deploy (no shell) - Apply shellQuote() to every user-controlled variable in shell commands across nemoclaw.js, onboard.js, nim.js, policies.js - Add RFC 1123 name validation at CLI dispatch for sandbox/instance names - Fix path traversal in policies.js loadPreset() - Replace predictable temp file with mkdtempSync() - Remove duplicate shellQuote() definitions (now single source in runner.js) - 9 new test cases for shellQuote, validateName, and path traversal Supersedes #55, #110, #475, #540, #48, #171.
|
Hey @dumko2001, thanks for tackling shell injection prevention here — that's an important hardening measure. This repo has been moving pretty fast lately; we've added CI checks and a bunch of new features since this PR was opened. Could you rebase onto the latest main so we can take a fresh look? Appreciate it! |
* security: fix command injection across all CLI entry points Comprehensive fix for shell injection vulnerabilities where user input (instance names, sandbox names, model names, API keys) was interpolated unsanitized into shell commands via run()/runInteractive()/execSync(). Changes: - Add shellQuote() and validateName() to runner.js as shared utilities - Replace all execSync() with execFileSync() in deploy (no shell) - Apply shellQuote() to every user-controlled variable in shell commands across nemoclaw.js, onboard.js, nim.js, policies.js - Add RFC 1123 name validation at CLI dispatch for sandbox/instance names - Fix path traversal in policies.js loadPreset() - Replace predictable temp file with mkdtempSync() - Remove duplicate shellQuote() definitions (now single source in runner.js) - 9 new test cases for shellQuote, validateName, and path traversal Supersedes #55, #110, #475, #540, #48, #171. * fix: deduplicate shellQuote in local-inference.js Import shellQuote from runner.js instead of defining a local copy. Single source of truth for shell quoting across the codebase. * security: fix telegram bridge injection + add regression guards Telegram bridge: - Replace execSync with execFileSync for ssh-config retrieval - shellQuote message, API key, and session ID in remote command - Validate SANDBOX_NAME at startup - Use mkdtempSync for temp SSH config (not predictable path) Regression tests: - nemoclaw.js must not use execSync - Single shellQuote definition in bin/ - CLI rejects malicious sandbox names (e2e, no mocking) - telegram-bridge.js validates SANDBOX_NAME and avoids execSync * security: address CodeRabbit review findings on shell injection PR - Shell-quote secret values written to .env before remote source - Wrap scp upload in try/finally to guarantee temp secret cleanup - Shell-quote CHAT_UI_URL and NVIDIA_API_KEY env args in onboard - Replace predictable Date.now() temp path with mkdtempSync in policies - Strengthen e2e test with canary file to prove payload never executes - Fix merge-introduced test expectations for shellQuote single-quote format
) * security: fix command injection across all CLI entry points Comprehensive fix for shell injection vulnerabilities where user input (instance names, sandbox names, model names, API keys) was interpolated unsanitized into shell commands via run()/runInteractive()/execSync(). Changes: - Add shellQuote() and validateName() to runner.js as shared utilities - Replace all execSync() with execFileSync() in deploy (no shell) - Apply shellQuote() to every user-controlled variable in shell commands across nemoclaw.js, onboard.js, nim.js, policies.js - Add RFC 1123 name validation at CLI dispatch for sandbox/instance names - Fix path traversal in policies.js loadPreset() - Replace predictable temp file with mkdtempSync() - Remove duplicate shellQuote() definitions (now single source in runner.js) - 9 new test cases for shellQuote, validateName, and path traversal Supersedes NVIDIA#55, NVIDIA#110, NVIDIA#475, NVIDIA#540, NVIDIA#48, NVIDIA#171. * fix: deduplicate shellQuote in local-inference.js Import shellQuote from runner.js instead of defining a local copy. Single source of truth for shell quoting across the codebase. * security: fix telegram bridge injection + add regression guards Telegram bridge: - Replace execSync with execFileSync for ssh-config retrieval - shellQuote message, API key, and session ID in remote command - Validate SANDBOX_NAME at startup - Use mkdtempSync for temp SSH config (not predictable path) Regression tests: - nemoclaw.js must not use execSync - Single shellQuote definition in bin/ - CLI rejects malicious sandbox names (e2e, no mocking) - telegram-bridge.js validates SANDBOX_NAME and avoids execSync * security: address CodeRabbit review findings on shell injection PR - Shell-quote secret values written to .env before remote source - Wrap scp upload in try/finally to guarantee temp secret cleanup - Shell-quote CHAT_UI_URL and NVIDIA_API_KEY env args in onboard - Replace predictable Date.now() temp path with mkdtempSync in policies - Strengthen e2e test with canary file to prove payload never executes - Fix merge-introduced test expectations for shellQuote single-quote format
|
@cv got superseded :( |
cv
left a comment
There was a problem hiding this comment.
The security goal is sound — argv arrays eliminate shell injection. But the implementation has issues.
runCapture() encoding bug
The patch switches from execSync (which returns a string with encoding: "utf-8") to spawnSync with stdio: "pipe". With spawnSync, result.stdout is a Buffer unless encoding is explicitly set — but ...opts at the end of the options object can override encoding. The code calls .trim() on the result, which will throw on a Buffer.
Incomplete coverage
Only docker and curl calls in nim.js are converted. The codebase still has shell pipelines that can't be expressed as argv arrays (e.g., "free -m | awk '/Mem:/ {print $2}'" in nim.js). The dual-mode run() that accepts either strings or arrays creates ambiguity about which pattern callers should use.
Overlap
- #721 (gateway isolation) restructures startup paths and runner patterns — high conflict risk
- #696 (download-then-execute) also hardens curl patterns in the same codebase
- The existing
shellQuote()andvalidateName()inrunner.jsalready prevent injection for user-supplied input at the callsites that matter most
No CI
Zero workflow runs. The "passed all 52 tests" claim in the description is self-reported.
Recommendation
The argv approach is correct in principle, but landing a partial conversion alongside the existing shell-quoting approach creates two competing patterns. Consider either:
- Scope down to only new code paths, or
- Commit to converting all callsites (including pipelines, which would need restructuring)
Either way, fix the encoding bug and get CI running first.
) * security: fix command injection across all CLI entry points Comprehensive fix for shell injection vulnerabilities where user input (instance names, sandbox names, model names, API keys) was interpolated unsanitized into shell commands via run()/runInteractive()/execSync(). Changes: - Add shellQuote() and validateName() to runner.js as shared utilities - Replace all execSync() with execFileSync() in deploy (no shell) - Apply shellQuote() to every user-controlled variable in shell commands across nemoclaw.js, onboard.js, nim.js, policies.js - Add RFC 1123 name validation at CLI dispatch for sandbox/instance names - Fix path traversal in policies.js loadPreset() - Replace predictable temp file with mkdtempSync() - Remove duplicate shellQuote() definitions (now single source in runner.js) - 9 new test cases for shellQuote, validateName, and path traversal Supersedes NVIDIA#55, NVIDIA#110, NVIDIA#475, NVIDIA#540, NVIDIA#48, NVIDIA#171. * fix: deduplicate shellQuote in local-inference.js Import shellQuote from runner.js instead of defining a local copy. Single source of truth for shell quoting across the codebase. * security: fix telegram bridge injection + add regression guards Telegram bridge: - Replace execSync with execFileSync for ssh-config retrieval - shellQuote message, API key, and session ID in remote command - Validate SANDBOX_NAME at startup - Use mkdtempSync for temp SSH config (not predictable path) Regression tests: - nemoclaw.js must not use execSync - Single shellQuote definition in bin/ - CLI rejects malicious sandbox names (e2e, no mocking) - telegram-bridge.js validates SANDBOX_NAME and avoids execSync * security: address CodeRabbit review findings on shell injection PR - Shell-quote secret values written to .env before remote source - Wrap scp upload in try/finally to guarantee temp secret cleanup - Shell-quote CHAT_UI_URL and NVIDIA_API_KEY env args in onboard - Replace predictable Date.now() temp path with mkdtempSync in policies - Strengthen e2e test with canary file to prove payload never executes - Fix merge-introduced test expectations for shellQuote single-quote format
cafc3a6 to
3f78e93
Compare
📝 WalkthroughWalkthroughRefactors process invocation across the codebase: shell-interpolated command strings are replaced with argv-style arrays, runner utilities now require/execute argv arrays via spawnSync, and callers (inference, onboarding, policies, preflight, registry) were updated to pass structured arguments and to avoid shell redirections/quoting. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~65 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@bin/lib/nim.js`:
- Around line 193-205: nimStatus currently probes a hard-coded port 8000 which
breaks non-default-port sandboxes; update nimStatus to use the same port chosen
by startNimContainer()/waitForNimHealth() by either adding a port parameter to
nimStatus or by reading the persisted port from the sandbox metadata (the same
place startNimContainer stores the selected port), then replace the fixed
"http://localhost:8000/v1/models" probe with a probe using that port; ensure any
callers of nimStatus (and tests) pass or can access the sandbox port so health
checks reflect the actual container port.
In `@bin/lib/onboard.js`:
- Around line 564-575: The current code pushes secrets into envArgs (see envArgs
and getCredential usage) and passes them on the command line to the long-running
"openshell sandbox create" invocation, which exposes them via ps/proc; stop
adding NVIDIA_API_KEY, DISCORD_BOT_TOKEN and SLACK_BOT_TOKEN to envArgs and
instead inject them securely: either (a) invoke "openshell sandbox create"
without secrets and then call a post-create step (e.g., "openshell sandbox
connect" or an equivalent API) to send secrets over stdin/pipe, or (b) write the
secrets to a protected temp file (mode 0600) and provide that file to the
process or API, removing it immediately after use; update the code paths that
build envArgs (references: envArgs, getCredential) and the place that spawns
"openshell sandbox create" to use one of these secure injection methods and
ensure the temp file or pipe is cleaned up and not left on disk.
- Around line 245-248: The code incorrectly tries to run the shell builtin
"command -v" via spawnSync in runCapture, causing ENOENT in functions
isOpenshellInstalled and setupNim and in checkPort; replace those invocations
with a proper PATH lookup (e.g., use the "which" binary or the which npm package
or Node's fs+process.env.PATH lookup) instead of spawning "command", updating
isOpenshellInstalled(), setupNim(), and checkPort() to call the PATH utility or
helper. Also avoid passing secrets as argv in envArgs passed into
streamSandboxCreate/spawn(); move NVIDIA_API_KEY, DISCORD_BOT_TOKEN,
SLACK_BOT_TOKEN into the child process environment (env object) when creating
the sandbox so sensitive values are not exposed via process arguments or
ps/cmdline.
In `@bin/lib/runner.js`:
- Around line 24-29: The spawn options currently spread ...opts after
constructing env so an opts.env value replaces the merged environment; fix by
building the options so env is set last as a merged object (e.g., create an
options object where env: { ...process.env, ...opts.env } is assigned after
spreading opts or by spreading opts first and then overriding env), and use that
when calling spawnSync in run (the spawnSync call around const result =
spawnSync(...)) and apply the identical change in runCapture; ensure you do not
leave ...opts after env which would overwrite the merged env.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: ee1a81ec-592e-4c57-afe9-3df35d6d18b9
📒 Files selected for processing (6)
bin/lib/local-inference.jsbin/lib/nim.jsbin/lib/onboard.jsbin/lib/policies.jsbin/lib/preflight.jsbin/lib/runner.js
3f78e93 to
b996722
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
bin/lib/nim.js (1)
163-184: Minor:safePortis computed but not used in the URL.Line 166 computes
safePort = Number(port)but Line 171 usesportdirectly. This doesn't create a security issue since the URL is passed as an argv element (no shell parsing), but for consistency you could usesafePortin the URL template.♻️ Optional consistency fix
try { - const result = runCapture(["curl", "-sf", `http://localhost:${port}/v1/models`], { + const result = runCapture(["curl", "-sf", `http://localhost:${safePort}/v1/models`], { ignoreError: true, });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bin/lib/nim.js` around lines 163 - 184, In waitForNimHealth, safePort is computed but the curl URL still uses port; update the runCapture call to interpolate safePort (i.e., change `http://localhost:${port}/v1/models` to `http://localhost:${safePort}/v1/models`) so the function consistently uses the sanitized numeric port value (symbols: waitForNimHealth, safePort, port, runCapture).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@bin/lib/nim.js`:
- Around line 163-184: In waitForNimHealth, safePort is computed but the curl
URL still uses port; update the runCapture call to interpolate safePort (i.e.,
change `http://localhost:${port}/v1/models` to
`http://localhost:${safePort}/v1/models`) so the function consistently uses the
sanitized numeric port value (symbols: waitForNimHealth, safePort, port,
runCapture).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 5031168d-d68e-4196-8849-982c8b042aa4
📒 Files selected for processing (7)
bin/lib/local-inference.jsbin/lib/nim.jsbin/lib/onboard.jsbin/lib/policies.jsbin/lib/preflight.jsbin/lib/registry.jsbin/lib/runner.js
🚧 Files skipped from review as they are similar to previous changes (4)
- bin/lib/policies.js
- bin/lib/local-inference.js
- bin/lib/preflight.js
- bin/lib/runner.js
) * security: fix command injection across all CLI entry points Comprehensive fix for shell injection vulnerabilities where user input (instance names, sandbox names, model names, API keys) was interpolated unsanitized into shell commands via run()/runInteractive()/execSync(). Changes: - Add shellQuote() and validateName() to runner.js as shared utilities - Replace all execSync() with execFileSync() in deploy (no shell) - Apply shellQuote() to every user-controlled variable in shell commands across nemoclaw.js, onboard.js, nim.js, policies.js - Add RFC 1123 name validation at CLI dispatch for sandbox/instance names - Fix path traversal in policies.js loadPreset() - Replace predictable temp file with mkdtempSync() - Remove duplicate shellQuote() definitions (now single source in runner.js) - 9 new test cases for shellQuote, validateName, and path traversal Supersedes NVIDIA#55, NVIDIA#110, NVIDIA#475, NVIDIA#540, NVIDIA#48, NVIDIA#171. * fix: deduplicate shellQuote in local-inference.js Import shellQuote from runner.js instead of defining a local copy. Single source of truth for shell quoting across the codebase. * security: fix telegram bridge injection + add regression guards Telegram bridge: - Replace execSync with execFileSync for ssh-config retrieval - shellQuote message, API key, and session ID in remote command - Validate SANDBOX_NAME at startup - Use mkdtempSync for temp SSH config (not predictable path) Regression tests: - nemoclaw.js must not use execSync - Single shellQuote definition in bin/ - CLI rejects malicious sandbox names (e2e, no mocking) - telegram-bridge.js validates SANDBOX_NAME and avoids execSync * security: address CodeRabbit review findings on shell injection PR - Shell-quote secret values written to .env before remote source - Wrap scp upload in try/finally to guarantee temp secret cleanup - Shell-quote CHAT_UI_URL and NVIDIA_API_KEY env args in onboard - Replace predictable Date.now() temp path with mkdtempSync in policies - Strengthen e2e test with canary file to prove payload never executes - Fix merge-introduced test expectations for shellQuote single-quote format
Rationale
Executing commands as strings in a shell is vulnerable to injection attacks if any part of the string contains untrusted or unescaped input.
Changes
Refactored internal calls to
dockerandcurlto useargvarrays withspawnSync, bypassing the shell and eliminating shell injection risks.Verification Results
npm test.Leading Standards
This PR follows the project's 'First Principles' approach, prioritizing deterministic behavior and zero-trust security defaults.
Summary by CodeRabbit