Skip to content

fix(security): eliminate API key leakage via ps aux across all three execution layers#1

Open
dumko2001 wants to merge 3 commits intomainfrom
security/harden-process-execution
Open

fix(security): eliminate API key leakage via ps aux across all three execution layers#1
dumko2001 wants to merge 3 commits intomainfrom
security/harden-process-execution

Conversation

@dumko2001
Copy link
Copy Markdown
Owner

Summary

Closes NVIDIA#325 — API key exposed in process list via ps aux.

This PR is a comprehensive rollup that supersedes open PRs NVIDIA#148, NVIDIA#191, NVIDIA#225, and NVIDIA#330. It fixes every instance of secret leakage and shell injection that was identified during a first-principles audit of all three execution layers.


Root Cause

openshell provider create --credential KEY=VALUE passes the secret as a command-line argument. Every user on the machine can read it via ps aux or /proc/<pid>/cmdline. The same pattern existed in:

Layer File Old pattern
Legacy CLI bin/lib/onboard.js run(\openshell provider create --credential "NVIDIA_API_KEY=${process.env.NVIDIA_API_KEY}"`)`
Plugin (TS) nemoclaw/src/commands/onboard.ts execOpenShell(["--credential", \${credentialEnv}=${apiKey}`])`
Blueprint (Python) nemoclaw-blueprint/orchestrator/runner.py provider_args.extend(["--credential", f"OPENAI_API_KEY={credential}"])

Fix — Three Commits

Commit 1 — fix(runner): safe argv primitives + opts.env overwrite fix

  • runArgv(prog, args, opts)spawnSync without shell; no metacharacter expansion
  • runCaptureArgv(prog, args, opts)execFileSync without shell; returns stdout string
  • assertSafeName(name, label) — validates user-supplied names against [a-zA-Z0-9][a-zA-Z0-9_-]{0,62}; calls process.exit(1) on rejection
  • Fix pre-existing opts.env overwrite bug: mergeEnv() destructures opts.env before the rest spread so PATH/HOME/DOCKER_HOST are always preserved

Commit 2 — fix(cli): shell-string → argv arrays (injection prevention)

Converts every run(\...`)` call that accepted user-controlled values across:

  • bin/lib/onboard.js — all openshell/bash/brew calls, fs.cpSync/fs.rmSync replacing shell cp/rm
  • bin/lib/nim.js — docker pull/rm/run/stop/inspect + assertSafeName on sandboxName
  • bin/lib/policies.js — openshell policy get/set + assertSafeName on both sandboxName and presetName + temp file written with mode: 0o600
  • bin/nemoclaw.js — removes inline NVIDIA_API_KEY=VALUE from sudo argv (superseded by sudo -E env inherit); assertSafeName on deploy instanceName; sandbox connect/status/logs/destroy → runArgv

Commit 3 — fix(credentials): env-lookup form for --credential; secrets never in argv

Safe pattern: set the secret in process.env / os.environ before the call, then pass only the env-var name to --credential — openshell reads the value from the environment, never from argv.

File Change
nemoclaw/src/commands/onboard.ts process.env[credentialEnv] = apiKey before execOpenShell; both provider create and update paths changed
nemoclaw-blueprint/orchestrator/runner.py target_cred_env with type-based fallback (supersedes NVIDIA#191); os.environ[target_cred_env] = credential; --credential target_cred_env
nemoclaw-blueprint/blueprint.yaml Add credential_env: NVIDIA_API_KEY to default profile — without it the fallback would pick OPENAI_API_KEY for the nvidia provider type
nemoclaw/src/onboard/config.ts writeFileSync uses { mode: 0o600 } so config.json is owner-readable only

Verification

$ node --test test/*.test.js
ℹ tests 84
ℹ pass 84
ℹ fail 0

Key tests added:

  • test/runner.test.js (22 assertions) — assertSafeName rejects ;, $(), |, ../, spaces; runCaptureArgv does not expand shell metacharacters; opts.env preserves PATH
  • test/credential-exposure.test.js (9 assertions) — static scan of all 3 layers for --credential KEY=VALUE patterns; structural checks for process.env[credentialEnv] and os.environ[target_cred_env]; runtime injection PoC

ps aux before/after

Before (vulnerable):

openshell provider create --name nvidia-nim --type openai --credential "NVIDIA_API_KEY=nvapi-xxxxxxxx..." --config "OPENAI_BASE_URL=..."

After (safe):

openshell provider create --name nvidia-nim --type openai --credential NVIDIA_API_KEY --config OPENAI_BASE_URL=...

The secret is in the process's environment variables, not in its argv.


PRs Superseded

PR Description Status
NVIDIA#148 Shell injection via sandbox name Superseded by commit 2
NVIDIA#191 Python runner credential type fallback Superseded by commit 3 (includes + extends)
NVIDIA#225 CI / non-interactive mode Existing isNonInteractive() in onboard.ts already implements this
NVIDIA#330 Credential leak in --credential arg Superseded by commit 3

@dumko2001 dumko2001 force-pushed the security/harden-process-execution branch from 2afa1f4 to 47512fa Compare March 19, 2026 03:34
Add three argv-safe helpers to bin/lib/runner.js:
  runArgv(prog, args, opts)        -- spawnSync without shell
  runCaptureArgv(prog, args, opts) -- execFileSync without shell; returns stdout
  assertSafeName(name, label)      -- validates against [a-zA-Z0-9][a-zA-Z0-9_-]{0,62}

Fix pre-existing opts.env overwrite: old spread { ...opts } after the merged env
silently clobbered it. mergeEnv(opts) destructures opts.env first.

test/runner.test.js: 22 new assertions (assertSafeName rejections, injection
PoC, opts.env preservation).
Closes shell-injection attack surface in the legacy CJS layer by replacing
all user-controlled run() / runCapture() shell strings with the new argv-safe
runArgv() / runCaptureArgv() helpers. assertSafeName() guards every
user-supplied sandbox/instance/preset name before it enters any command.

bin/lib/onboard.js  -- all openshell/bash/brew calls -> runArgv;
                       file copies -> fs.cpSync/fs.rmSync (no cp shell)
bin/lib/nim.js      -- docker pull/rm/run/stop/inspect -> runArgv/runCaptureArgv;
                       assertSafeName guard on sandboxName
bin/lib/policies.js -- openshell policy get/set -> runCaptureArgv/runArgv;
                       assertSafeName on sandboxName and presetName;
                       temp policy file written with mode 0o600
bin/nemoclaw.js     -- setupSpark: remove inline NVIDIA_API_KEY=VALUE from
                       sudo argv (sudo -E already inherits env);
                       deploy: assertSafeName on instanceName;
                       sandbox connect/status/logs/destroy -> runArgv

Supersedes PRs: NVIDIA#148 (shell injection), part of NVIDIA#330 (credential leak).
… in argv

Fixes: NVIDIA#325 (API key exposed in process list via ps aux)
Supersedes: PRs NVIDIA#191, NVIDIA#330

The root cause: all three execution layers passed the actual credential
VALUE as --credential KEY=VALUE, making it visible to any local user via
`ps aux` or /proc/<pid>/cmdline.

Safe pattern: set the secret in the child's inherited env, then pass only
the env-var NAME to --credential (openshell env-lookup form).

nemoclaw/src/commands/onboard.ts
  - process.env[credentialEnv] = apiKey before execOpenShell
  - --credential arg: credentialEnv (name only, not KEY=VALUE)
  - applies to both provider create and provider update paths

nemoclaw-blueprint/orchestrator/runner.py
  - Rename credential_env -> target_cred_env with type-based fallback
    (nvidia -> NVIDIA_API_KEY, openai -> OPENAI_API_KEY) when not set
    in the blueprint profile. Supersedes PR NVIDIA#191's partial fix.
  - os.environ[target_cred_env] = credential before run_cmd
  - --credential arg: target_cred_env (name only)

nemoclaw-blueprint/blueprint.yaml
  - Add credential_env: NVIDIA_API_KEY to the default profile.
    Without this field the type-based fallback would silently use
    OPENAI_API_KEY for the nvidia provider_type, causing auth failure.

nemoclaw/src/onboard/config.ts
  - writeFileSync for config.json now passes mode: 0o600 so the file
    containing endpoint/model/credentialEnv metadata is not world-readable.

test/credential-exposure.test.js (new file)
  - Static source scan: asserts no --credential KEY=VALUE pattern in any
    of the 3 execution layer files (allowlists dummy/ollama stubs)
  - Layer-specific structural checks (process.env set, os.environ set,
    blueprint default profile has credential_env)
  - Runtime injection PoC: proves old bash -c IS vulnerable; new
    runCaptureArgv IS NOT

All 84 tests pass.
@dumko2001 dumko2001 force-pushed the security/harden-process-execution branch from 47512fa to 892bc25 Compare March 19, 2026 03:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Security] NVIDIA API key exposed in process list when creating inference provider

1 participant