test(security): add Brev E2E tests for command injection and credential sanitization by jyaunches · Pull Request #1092 · NVIDIA/NemoClaw

jyaunches · 2026-03-30T13:17:18Z

Summary

Adds an ephemeral Brev-based E2E test infrastructure for running security test suites against a live NemoClaw sandbox. Validates the fixes from PR #584 (command injection) and PR #743 (credential sanitization) end-to-end.

Validated on fork: Run #23668176481 — 24/24 credential sanitization + 18/18 telegram injection ✅

Depends on: #1090 (setup.sh CI improvements — timestamped output, NEMOCLAW_SANDBOX_NAME, --no-tty, base image fallback)

What already existed in the repo

nightly-e2e.yaml — runs test/e2e/test-full-e2e.sh nightly on GitHub-hosted runners. Tests the full install → onboard → inference journey. Runs directly on the runner (no remote instance).
test/e2e/test-full-e2e.sh — the existing full E2E script testing the complete user journey.
vitest.config.ts — had cli and plugin projects.

What this adds

New files

File	Purpose
`.github/workflows/e2e-brev.yaml`	Workflow that provisions an ephemeral Brev cloud instance, bootstraps NemoClaw over SSH, runs test suites remotely, reports results to PRs, and optionally keeps the instance alive for debugging.
`test/e2e/brev-e2e.test.js`	Vitest orchestrator: Brev auth → create instance → SSH wait → code sync → bootstrap → nemoclaw CLI install → sandbox registry → run remote tests → cleanup.
`test/e2e/test-credential-sanitization.sh`	24 tests for PR #743: credential stripping from migration bundles (C1–C5), runtime sandbox credential checks (C6–C7), symlink traversal protection (C8), blueprint digest verification (C9–C11), pattern-based credential field detection (C12–C13), and shipped blueprint state (Phase 6).
`test/e2e/test-telegram-injection.sh`	18 tests for PR #584: `$(cmd)` substitution (T1), backtick injection (T2), quote breakout (T3), `${VAR}` expansion (T4), process table leak checks (T5), `SANDBOX_NAME` validation with shell metacharacters (T6–T7), and normal message regression (T8).

Modified files

File	Change
`vitest.config.ts`	Added `e2e-brev` project, gated by `BREV_API_TOKEN` env var so it never runs in normal `npx vitest`.

Key design decisions

Ephemeral instances — no persistent state, no idle cost. Instance is created fresh for each run and destroyed after (unless keep_alive: true).
CPU-only (--cpu 4x16) — OpenShell gateway and sandbox work without GPU. Security tests don't need inference.
Secrets via stdin — NVIDIA_API_KEY and GITHUB_TOKEN are piped to the remote VM through SSH stdin, never appearing in CLI args or process listings.
Sandbox registry workaround — setup.sh creates sandboxes via openshell directly without writing to ~/.nemoclaw/sandboxes.json. The test orchestrator writes the registry entry after bootstrap so nemoclaw <name> status works.
full suite isolation — full runs install.sh --non-interactive which destroys and recreates the sandbox. The security tests need the existing sandbox, so all runs only credential-sanitization + telegram-injection (not full).

What needs to be in the repo to run this

Secrets (configured by repo admin)

Secret	Description
`BREV_API_TOKEN`	Brev refresh token for headless NGC auth. Obtain via `brev login`, then copy the `refresh_token` from `~/.brev/credentials.json`.
`NVIDIA_API_KEY`	NVIDIA API key for inference provider setup during sandbox creation.

Infrastructure

Brev organization with CPU instance credits (currently Nemoclaw CI/CD).
Brev CLI is pinned to v0.6.310 in the workflow (v0.6.322 removed the --cpu flag).

How a maintainer uses this

Running from the Actions UI

Go to Actions → e2e-brev → Run workflow
Select inputs:
- branch: which branch to test
- test_suite: see descriptions in the workflow header
  - full — complete user journey: install → onboard → inference → CLI ops (~10 min, destroys sandbox)
  - credential-sanitization — 24 tests for migration credential stripping, digest verification, symlink safety
  - telegram-injection — 18 tests for shell injection prevention across all attack vectors
  - all — runs both security suites (not full, which would destroy the sandbox they need)
- keep_alive: true to SSH in after, false to auto-cleanup
Results appear in the workflow run logs. If pr_number is set, a check run and comment are posted to the PR.

Calling from another workflow

jobs:
  security-e2e:
    uses: ./.github/workflows/e2e-brev.yaml
    with:
      branch: ${{ github.head_ref }}
      pr_number: ${{ github.event.pull_request.number }}
      test_suite: all
    secrets:
      BREV_API_TOKEN: ${{ secrets.BREV_API_TOKEN }}
      NVIDIA_API_KEY: ${{ secrets.NVIDIA_API_KEY }}

SSH debugging (keep_alive: true)

When keep_alive is enabled, the instance stays running after tests. The workflow logs and PR comment include connection instructions:

# Refresh local SSH config to pick up the Brev instance
brev refresh

# SSH into the instance
ssh e2e-pr-<number>

# On the instance — re-run a test manually
cd ~/nemoclaw
export NVIDIA_API_KEY=nvapi-...
export NEMOCLAW_SANDBOX_NAME=e2e-test
bash test/e2e/test-credential-sanitization.sh

# When done, clean up
brev delete e2e-pr-<number>

⚠️ Remember to delete the instance when done. Instances left running consume Brev credits. The workflow does NOT auto-delete when keep_alive: true.

Adding new test suites

To add a new E2E test suite:

Create a test script in test/e2e/ following the existing pattern (pass()/fail()/skip() functions, section() for phases, prerequisite checks in Phase 0, summary with pass/fail counts).

Add the suite to brev-e2e.test.js — add a new it.runIf(...) block:

it.runIf(TEST_SUITE === "my-new-suite" || TEST_SUITE === "all")(
  "my new suite passes on remote VM",
  () => {
    const output = runRemoteTest("test/e2e/test-my-new-suite.sh");
    expect(output).toContain("PASS");
    expect(output).not.toMatch(/FAIL:/);
  },
  600_000, // timeout in ms
);

Add the choice to the workflow — add your suite name to the options list in e2e-brev.yaml under test_suite, and add a description to the workflow header comment.
Note: If your test needs the sandbox created during beforeAll, add it to the all group. If it creates/destroys its own sandbox (like full), keep it separate.

Files changed

 .github/workflows/e2e-brev.yaml          |  39 +-
 scripts/brev-setup.sh                    |  11 +-
 scripts/setup.sh                         |  52 +-
 test/e2e/brev-e2e.test.js                | 246 ++++++--
 test/e2e/test-credential-sanitization.sh | 789 +++++++++++++++++++++++++++
 test/e2e/test-telegram-injection.sh      | 464 ++++++++++++++++
 6 files changed, 1563 insertions(+), 38 deletions(-)

Summary by CodeRabbit

New Features
- Optional launchable mode for remote test environments
- New telegram-injection and credential-sanitization end-to-end test suites
Improvements
- Timestamped logging for clearer diagnostics
- Increased e2e job and bootstrap timeouts
- Real-time sandbox build progress reporting and streaming test output
- Prefer configurable sandbox name and improved Docker base-image handling
- Optional skip of vLLM install during provisioning
Tests
- Updated test orchestration and suite selection behavior

coderabbitai · 2026-03-30T13:17:54Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds optional NemoClaw "launchable" provisioning (USE_LAUNCHABLE), expands CI workflow inputs (adds use_launchable and telegram-injection), increases e2e job timeout, introduces timestamped logging and build progress reporting, adds SKIP_VLLM control, and adds two end-to-end tests for credential sanitization and Telegram injection.

Changes

Cohort / File(s)	Summary
Workflow Configuration `.github/workflows/e2e-brev.yaml`	Added `use_launchable` boolean input to `workflow_dispatch` and `workflow_call` (default `true`); added `telegram-injection` to `test_suite` choices; increased job timeout 45→90 minutes; pass `USE_LAUNCHABLE` into remote env.
Brev / Sandbox setup scripts `scripts/brev-setup.sh`, `scripts/setup.sh`	Added `_ts()` timestamp helper and switched log prefixes to timestamped format; added `SKIP_VLLM` gating to skip vLLM install/start; changed `SANDBOX_NAME` precedence to prefer `NEMOCLAW_SANDBOX_NAME`; added `BASE_IMAGE` pull/build fallback and background build progress reporter; modified `openshell sandbox create` invocation.
E2E test harness `test/e2e/brev-e2e.test.js`	Introduced `USE_LAUNCHABLE`-driven provisioning path (launchable boot vs legacy `brev create`), rsync/install/`nemoclaw onboard` flow for launchable, SSH `stream` mode with remote `tee` to `/tmp/test-output.log`, extended bootstrap timeouts, and adjusted test-suite selection (new `telegram-injection` suite; `full` runs only when `TEST_SUITE === "full"`).
New E2E tests `test/e2e/test-credential-sanitization.sh`, `test/e2e/test-telegram-injection.sh`	Added two Bash e2e tests: credential sanitization (bundle sanitization, symlink safety, blueprint digest checks) and Telegram injection (multiple injection probes, key-leak checks, sandbox-name hardening, PASS/FAIL/SKIP reporting).
Unit test update `test/setup-sandbox-name.test.js`	Updated expectations to match new `SANDBOX_NAME` precedence: `$1` → `NEMOCLAW_SANDBOX_NAME` env → literal `nemoclaw`.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant CI as CI Runner
    participant Remote as Remote VM
    participant Launch as NemoClaw Launchable
    participant Sandbox as Sandbox

    CI->>Remote: SSH to provision (send USE_LAUNCHABLE)
    alt USE_LAUNCHABLE=1
        Remote->>Launch: run launch-nemoclaw.sh (request provisioning)
        Launch-->>Remote: stream provisioning status
        loop Poll readiness
            CI->>Remote: check readiness marker/logs
        end
        CI->>Remote: rsync repo, npm ci, run tests (remote tee -> /tmp/test-output.log)
        Remote->>Sandbox: run `nemoclaw onboard` / openshell commands
        Sandbox-->>Remote: test output & logs
        Remote-->>CI: streamed output and fetched logs
    else USE_LAUNCHABLE=0
        CI->>Remote: run legacy `brev create` + `scripts/brev-setup.sh`
        Remote->>Remote: local bootstrap (vLLM gated by SKIP_VLLM), build sandbox
        Remote-->>CI: streamed bootstrap output and logs
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related issues

Add Brev E2E tests for command injection and credential sanitization #1097 — Adds Brev E2E tests and workflow/test orchestration covering command-injection and credential-sanitization scenarios described in the issue.

Poem

🐰 I hopped through logs with timestamps bright,

I skipped a vLLM when told to take flight,
I probed the bridge for sneaky command tricks,
I scrubbed the secrets with careful quick flicks,
Carrots, tests, and launchables — a joyful night!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 16.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main change: adding security E2E tests for command injection and credential sanitization via Brev infrastructure.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 7

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/e2e-brev.yaml:
- Line 99: The workflow's timeout-minutes (symbol: timeout-minutes) is too low
relative to the test suite budget: increase the GitHub Actions job timeout to
exceed the test's beforeAll 45-minute claim plus the two possible 10-minute test
slots and overhead so afterAll can run; update timeout-minutes to a safe value
(e.g., 90 or 120) in the workflow so runs with TEST_SUITE=all and the
beforeAll/afterAll in test/e2e/brev-e2e.test.js can complete without being
terminated.

In `@test/e2e/brev-e2e.test.js`:
- Around line 111-123: The remote pipeline is losing failure status because
pipefail isn't enabled; update runRemoteTest (and the similar block at lines
223-225) so the remote shell runs with pipefail enabled before streaming and
teeing outputs — e.g., ensure the constructed remote command uses "set -o
pipefail" or invokes bash with -o pipefail (so failures in `bash ${scriptPath} |
tee /tmp/test-output.log` and `npm ci ... | tail -5` propagate); modify the
command built for sshWithSecrets/ssh in runRemoteTest to enable pipefail so the
exit code of the pipeline reflects the first failing command.
- Around line 177-205: The polling loop that checks /tmp/launch-plugin.log and
~/.cache/nemoclaw-plugin/install-ran (using the ssh calls and variables
setupStart, setupMaxWait, setupPollInterval) currently falls through after 40
minutes allowing later steps to run; modify the code so that if neither
readiness marker ("=== Ready ===" in log nor markerCheck returning DONE) is
detected before Date.now() - setupStart >= setupMaxWait, the test fails fast by
throwing an error (or calling process.exit(1)) with a clear message including
both markers and elapsed time; place this failure immediately after the while
loop (or by breaking the loop into a condition that throws when timed out) so
rsync/npm ci/nemoclaw onboard are not executed on an incomplete VM.
- Around line 37-42: LAUNCHABLE_SETUP_SCRIPT currently points to a branch ref
and USE_LAUNCHABLE defaults to using it; change LAUNCHABLE_SETUP_SCRIPT to an
immutable source (replace the branch ref with a commit SHA in the raw GitHub URL
or vendor the script into the repo and point the constant to that vendored path)
and keep USE_LAUNCHABLE logic intact. Also update the test's setup readiness
loop (the wait loop used to detect initialization completion) to surface a clear
failure: after the existing polling loop add explicit timeout handling that
throws or fails the test when the overall wait exceeds the intended deadline
(rather than silently proceeding), so the test stops with a clear error if
initialization never completes.

In `@test/e2e/test-credential-sanitization.sh`:
- Around line 214-278: The test currently embeds local implementations of
stripCredentials and walkAndRemoveFile instead of invoking NemoClaw’s real
migration sanitizer from migration-state.ts; update the test to call the actual
sanitizer (the exported function(s) in migration-state.ts) and the repository’s
migration-aware removal logic so assertions reflect production behavior (e.g.,
gateway removal vs. partial stripping) rather than the bundled
stripCredentials/walkAndRemoveFile copies; locate and replace usages of
stripCredentials and walkAndRemoveFile in this script (and the other noted
ranges 315-367, 439-468) to import/execute the migration-state
sanitizer/exported functions and assert on its real output.
- Around line 486-590: The test is self-fulfilling because it defines local
helpers (verifyBlueprintDigest_FIXED and verifyDigest) inside node -e and never
exercises the repository's real digest-verification code; replace the inline
helpers with calls that exercise the actual verification implementation (e.g.,
require the module that exports the real verify/validate function or invoke the
CLI/entrypoint that performs manifest digest checks) and feed it crafted
manifests (empty string, undefined, wrong digest, correct digest) and blueprint
content so the test fails if the repo code is vulnerable; update the assertions
to look for the real tool's output/exit status instead of the local helper
results (reference the verifyBlueprintDigest_FIXED/verifyDigest helpers in the
diff to locate the test sections to modify).

In `@test/e2e/test-telegram-injection.sh`:
- Around line 86-109: The test currently injects messages via an SSH command
that echoes the payload (send_message_to_sandbox), which bypasses the real
bridge path and therefore misses re-evaluation issues; change the test to drive
the actual Telegram bridge invocation used in scripts/telegram-bridge.js (the
same path as runAgentInSandbox / shellQuote(message) / safeSessionId /
nemoclaw-start -m ...) so the message is passed through the bridge’s argument
handling rather than via echo/$(cat). Concretely, replace the SSH payload string
in send_message_to_sandbox with a call that runs the bridge entry (the same
command runAgentInSandbox would use, passing the session id and message as
-m/--message and using the bridge’s quoting/safe-session logic) so the tests
exercise the real message-path quoting and session handling.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c43264aa-fc09-4107-a384-03bbbcb23a8a

📥 Commits

Reviewing files that changed from the base of the PR and between f59f58e and bb8fd23.

📒 Files selected for processing (6)

.github/workflows/e2e-brev.yaml
scripts/brev-setup.sh
scripts/setup.sh
test/e2e/brev-e2e.test.js
test/e2e/test-credential-sanitization.sh
test/e2e/test-telegram-injection.sh

.github/workflows/e2e-brev.yaml

coderabbitai · 2026-03-30T13:33:08Z

test/e2e/brev-e2e.test.js

+const LAUNCHABLE_SETUP_SCRIPT =
+  "https://raw.githubusercontent.com/NVIDIA/OpenShell-Community/refs/heads/feat/brev-nemoclaw-plugin/brev/launch-nemoclaw.sh";
+const NEMOCLAW_REPO_URL = "https://github.com/NVIDIA/NemoClaw.git";
+
+// Use launchable by default; set USE_LAUNCHABLE=0 or USE_LAUNCHABLE=false to fall back to brev-setup.sh
+const USE_LAUNCHABLE = !["0", "false"].includes(process.env.USE_LAUNCHABLE?.toLowerCase());


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Check the test file content at the specified lines head -n 165 test/e2e/brev-e2e.test.js | tail -n +30

Repository: NVIDIA/NemoClaw

Length of output: 5536

🏁 Script executed:

# Get full file size to understand scope wc -l test/e2e/brev-e2e.test.js

Repository: NVIDIA/NemoClaw

Length of output: 88

🏁 Script executed:

# Search for all references to LAUNCHABLE_SETUP_SCRIPT and USE_LAUNCHABLE rg "LAUNCHABLE_SETUP_SCRIPT|USE_LAUNCHABLE" -n

Repository: NVIDIA/NemoClaw

Length of output: 621

🏁 Script executed:

# Check for CI/CD configuration files that might reference this test fd -t f -e yml -e yaml -e json | head -20 | xargs grep -l "brev-e2e\|e2e" 2>/dev/null

Repository: NVIDIA/NemoClaw

Length of output: 43

🏁 Script executed:

# Read the launchable path section in detail (lines 144-180) sed -n '144,180p' test/e2e/brev-e2e.test.js

Repository: NVIDIA/NemoClaw

Length of output: 2035

🏁 Script executed:

# Check if there's any validation, pinning, or comments about the feature branch rg "feat/brev-nemoclaw-plugin|LAUNCHABLE|launch-nemoclaw" -B 3 -A 3

Repository: NVIDIA/NemoClaw

Length of output: 2577

🏁 Script executed:

# Look for any recent commits or comments indicating this is temporary/experimental git log --oneline -n 20 -- test/e2e/brev-e2e.test.js 2>/dev/null | head -10

Repository: NVIDIA/NemoClaw

Length of output: 142

🏁 Script executed:

# Check if there's a .github/workflows or similar CI directory fd -t d ".github|.gitlab|.circleci" -o "workflows|pipelines" 2>/dev/null

Repository: NVIDIA/NemoClaw

Length of output: 41

🏁 Script executed:

# Read the setup wait loop (continuing from line 180) sed -n '180,220p' test/e2e/brev-e2e.test.js

Repository: NVIDIA/NemoClaw

Length of output: 2321

🏁 Script executed:

# Check if there are any documented requirements about the feature branch or script pinning rg "OpenShell-Community|feat/brev" -B 5 -A 5

Repository: NVIDIA/NemoClaw

Length of output: 3032

🏁 Script executed:

# Look for any tests or validation of the external script rg "LAUNCHABLE|setup-script" -B 2 -A 2

Repository: NVIDIA/NemoClaw

Length of output: 2244

Pin the launchable bootstrap to an immutable revision.

LAUNCHABLE_SETUP_SCRIPT points at a raw GitHub URL on a feature branch (refs/heads/feat/brev-nemoclaw-plugin), and USE_LAUNCHABLE makes that path the default. A force-push on that branch can silently change what privileged CI machines execute, making the suite non-reproducible and widening the supply-chain blast radius. Pin to a commit SHA or vendor the script in this repository.

Additionally, the setup readiness loop (lines 180–220) times out silently after 40 minutes without raising an error, allowing the test to proceed with incomplete initialization. Add explicit timeout handling after the wait loop.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@test/e2e/brev-e2e.test.js` around lines 37 - 42, LAUNCHABLE_SETUP_SCRIPT currently points to a branch ref and USE_LAUNCHABLE defaults to using it; change LAUNCHABLE_SETUP_SCRIPT to an immutable source (replace the branch ref with a commit SHA in the raw GitHub URL or vendor the script into the repo and point the constant to that vendored path) and keep USE_LAUNCHABLE logic intact. Also update the test's setup readiness loop (the wait loop used to detect initialization completion) to surface a clear failure: after the existing polling loop add explicit timeout handling that throws or fails the test when the overall wait exceeds the intended deadline (rather than silently proceeding), so the test stops with a clear error if initialization never completes.

test/e2e/brev-e2e.test.js

coderabbitai · 2026-03-30T13:33:08Z

test/e2e/test-credential-sanitization.sh

+sanitize_result=$(cd "$REPO" && node -e "
+const fs = require('fs');
+const path = require('path');
+
+// --- Credential field detection (mirrors migration-state.ts) ---
+const CREDENTIAL_FIELDS = new Set([
+  'apiKey', 'api_key', 'token', 'secret', 'password', 'resolvedKey',
+]);
+const CREDENTIAL_FIELD_PATTERN =
+  /(?:access|refresh|client|bearer|auth|api|private|public|signing|session)(?:Token|Key|Secret|Password)$/;
+
+function isCredentialField(key) {
+  return CREDENTIAL_FIELDS.has(key) || CREDENTIAL_FIELD_PATTERN.test(key);
+}
+
+function stripCredentials(obj) {
+  if (obj === null || obj === undefined) return obj;
+  if (typeof obj !== 'object') return obj;
+  if (Array.isArray(obj)) return obj.map(stripCredentials);
+  const result = {};
+  for (const [key, value] of Object.entries(obj)) {
+    if (isCredentialField(key)) {
+      result[key] = '[STRIPPED_BY_MIGRATION]';
+    } else {
+      result[key] = stripCredentials(value);
+    }
+  }
+  return result;
+}
+
+function walkAndRemoveFile(dirPath, targetName) {
+  let entries;
+  try { entries = fs.readdirSync(dirPath); } catch { return; }
+  for (const entry of entries) {
+    const fullPath = path.join(dirPath, entry);
+    try {
+      const stat = fs.lstatSync(fullPath);
+      if (stat.isSymbolicLink()) continue;
+      if (stat.isDirectory()) {
+        walkAndRemoveFile(fullPath, targetName);
+      } else if (entry === targetName) {
+        fs.rmSync(fullPath, { force: true });
+      }
+    } catch {}
+  }
+}
+
+const bundleDir = '$BUNDLE_DIR';
+
+// 1. Remove auth-profiles.json
+const agentsDir = path.join(bundleDir, 'agents');
+if (fs.existsSync(agentsDir)) {
+  walkAndRemoveFile(agentsDir, 'auth-profiles.json');
+}
+
+// 2. Strip credential fields from openclaw.json
+const configPath = path.join(bundleDir, 'openclaw.json');
+if (fs.existsSync(configPath)) {
+  const config = JSON.parse(fs.readFileSync(configPath, 'utf-8'));
+  const sanitized = stripCredentials(config);
+  fs.writeFileSync(configPath, JSON.stringify(sanitized, null, 2));
+}
+
+console.log('SANITIZED');
+" 2>&1)


⚠️ Potential issue | 🟠 Major

Exercise the real migration sanitizer instead of local copies.

These checks validate hand-written stripCredentials() / walkAndRemoveFile() snippets rather than NemoClaw’s migration implementation. The copy already drifts from nemoclaw/src/commands/migration-state.ts: production deletes gateway entirely, while the assertions here expect gateway.mode plus a stripped gateway.auth.token. This suite can stay green while the shipped sanitizer or symlink handling regresses.

Also applies to: 315-367, 439-468

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@test/e2e/test-credential-sanitization.sh` around lines 214 - 278, The test currently embeds local implementations of stripCredentials and walkAndRemoveFile instead of invoking NemoClaw’s real migration sanitizer from migration-state.ts; update the test to call the actual sanitizer (the exported function(s) in migration-state.ts) and the repository’s migration-aware removal logic so assertions reflect production behavior (e.g., gateway removal vs. partial stripping) rather than the bundled stripCredentials/walkAndRemoveFile copies; locate and replace usages of stripCredentials and walkAndRemoveFile in this script (and the other noted ranges 315-367, 439-468) to import/execute the migration-state sanitizer/exported functions and assert on its real output.

test/e2e/test-credential-sanitization.sh

coderabbitai · 2026-03-30T13:33:08Z

test/e2e/test-telegram-injection.sh

+send_message_to_sandbox() {
+  local message="$1"
+  local session_id="${2:-e2e-injection-test}"
+
+  local ssh_config
+  ssh_config="$(mktemp)"
+  openshell sandbox ssh-config "$SANDBOX_NAME" >"$ssh_config" 2>/dev/null
+
+  # Use the same mechanism as the bridge: pass message as an argument
+  # via SSH. The key security property is that the message must NOT be
+  # interpreted as shell code on the remote side.
+  local result
+  result=$(timeout 90 ssh -F "$ssh_config" \
+    -o StrictHostKeyChecking=no \
+    -o UserKnownHostsFile=/dev/null \
+    -o ConnectTimeout=10 \
+    -o LogLevel=ERROR \
+    "openshell-${SANDBOX_NAME}" \
+    "echo 'INJECTION_PROBE_START' && echo $(printf '%q' "$message") && echo 'INJECTION_PROBE_END'" \
+    2>&1) || true
+
+  rm -f "$ssh_config"
+  echo "$result"
+}


⚠️ Potential issue | 🟠 Major

Drive the real Telegram bridge code for the message-path probes.

These tests send payloads through bespoke ssh ... 'MSG=$(cat) ...' commands, not through scripts/telegram-bridge.js’s runAgentInSandbox() path (shellQuote(message), safeSessionId, nemoclaw-start ... -m ...). Because MSG=$(cat) stores the payload after the shell has already parsed the command, $(), backticks, and ${VAR} in the message are never re-evaluated here. The probes can therefore pass even if the bridge regresses.

Also applies to: 188-277, 403-444

🧰 Tools

🪛 Shellcheck (0.11.0)

[warning] 88-88: session_id appears unused. Verify use (or export if used externally).

(SC2034)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@test/e2e/test-telegram-injection.sh` around lines 86 - 109, The test currently injects messages via an SSH command that echoes the payload (send_message_to_sandbox), which bypasses the real bridge path and therefore misses re-evaluation issues; change the test to drive the actual Telegram bridge invocation used in scripts/telegram-bridge.js (the same path as runAgentInSandbox / shellQuote(message) / safeSessionId / nemoclaw-start -m ...) so the message is passed through the bridge’s argument handling rather than via echo/$(cat). Concretely, replace the SSH payload string in send_message_to_sandbox with a call that runs the bridge entry (the same command runAgentInSandbox would use, passing the session id and message as -m/--message and using the bridge’s quoting/safe-session logic) so the tests exercise the real message-path quoting and session handling.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

test/e2e/test-telegram-injection.sh (2)
92-115: Dead code: send_message_to_sandbox() is defined but never called.

This function is declared with detailed documentation about mirroring the bridge's behavior, but none of the test cases (T1–T8b) actually invoke it. All tests use inline SSH commands directly. Either:

Update the tests to use this helper for consistency, or

Remove the dead code

Additionally, the session_id parameter on line 94 is unused (SC2034).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/test-telegram-injection.sh` around lines 92 - 115, The helper
function send_message_to_sandbox is dead code and its session_id parameter is
unused; either remove the entire send_message_to_sandbox function and delete the
unused session_id, or update the test cases (T1–T8b) to call
send_message_to_sandbox instead of the current inline ssh commands so the helper
is actually used; if you keep the helper, remove or use the session_id parameter
to silence SC2034 and update all places that constructed SSH invocations to call
send_message_to_sandbox with the message (and session_id if needed) for
consistency.
40-40: Consider adding -e to set options for early failure detection.

The script uses set -uo pipefail but omits -e, meaning commands can fail without halting the script. While this may be intentional (allowing all tests to run and report), failures in critical setup sections (e.g., lines 162–167) might go unnoticed if not carefully handled.

If intentional, consider documenting this choice in the header comments.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/test-telegram-injection.sh` at line 40, The script's top-level "set
-uo pipefail" omits "-e", so failing commands (especially in the critical setup
block referenced by the reviewer) may not abort the run; either add "-e" to the
existing "set -uo pipefail" invocation to enable early exit on errors, or
explicitly document in the script header why "-e" was intentionally omitted and
ensure the setup block (the critical initialization steps) performs explicit
exit-on-failure checks (check return codes and call exit on failure) so failures
can't be silently ignored.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/e2e/test-telegram-injection.sh`:
- Around line 383-399: The test loop is vulnerable to shell command substitution
when invalid_name contains backticks because the node -e "..." string is
double-quoted; fix by avoiding unescaped interpolation into a double-quoted
command: pass the test value safely (e.g., export or pass invalid_name as a
positional argument or via printf '%s' to construct the node invocation) so
backticks are not executed, then call validateName(...) inside the Node snippet
using that safe input; update the node -e invocation that calls validateName to
read the value from process.env or process.argv instead of embedding
$invalid_name directly.

---

Nitpick comments:
In `@test/e2e/test-telegram-injection.sh`:
- Around line 92-115: The helper function send_message_to_sandbox is dead code
and its session_id parameter is unused; either remove the entire
send_message_to_sandbox function and delete the unused session_id, or update the
test cases (T1–T8b) to call send_message_to_sandbox instead of the current
inline ssh commands so the helper is actually used; if you keep the helper,
remove or use the session_id parameter to silence SC2034 and update all places
that constructed SSH invocations to call send_message_to_sandbox with the
message (and session_id if needed) for consistency.
- Line 40: The script's top-level "set -uo pipefail" omits "-e", so failing
commands (especially in the critical setup block referenced by the reviewer) may
not abort the run; either add "-e" to the existing "set -uo pipefail" invocation
to enable early exit on errors, or explicitly document in the script header why
"-e" was intentionally omitted and ensure the setup block (the critical
initialization steps) performs explicit exit-on-failure checks (check return
codes and call exit on failure) so failures can't be silently ignored.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ca5cdcab-3f73-401c-bcb1-d85860438a73

📥 Commits

Reviewing files that changed from the base of the PR and between b548976 and 9c4ed17.

📒 Files selected for processing (2)

test/e2e/brev-e2e.test.js
test/e2e/test-telegram-injection.sh

🚧 Files skipped from review as they are similar to previous changes (1)

test/e2e/brev-e2e.test.js

test/e2e/test-telegram-injection.sh

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (2)

test/e2e/test-credential-sanitization.sh (2)
487-591: ⚠️ Potential issue | 🟠 Major

C9-C11 are still self-fulfilling digest tests.

The inline verifyBlueprintDigest_FIXED() / verifyDigest() helpers never call NemoClaw’s real blueprint verification path. As written, these cases only prove the sample implementation behaves as expected, so a repo regression can still pass this suite.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/test-credential-sanitization.sh` around lines 487 - 591, The tests
currently exercise only local helpers (verifyBlueprintDigest_FIXED,
verifyBlueprintDigest_VULNERABLE, verifyDigest) rather than NemoClaw’s real
verification routine, so replace the inline helper calls with calls into the
actual verifier exported by the codebase (require/import the module that exposes
the NemoClaw blueprint verification function, e.g. verifyBlueprint or whatever
exported symbol the project uses), remove the stub functions, and pass real
manifest + blueprint content into that verifier; update assertions to check the
real verifier’s result (accepted vs rejected) for empty/undefined, wrong, and
correct digest cases so the test fails on regressions.
213-279: ⚠️ Potential issue | 🟠 Major

Exercise NemoClaw’s real sanitizer/removal code here.

These sections still define their own stripCredentials() / walkAndRemoveFile() inside node -e, so they only validate the pasted helpers. A regression in the shipped migration bundle logic can still leave C1-C8 and C12-C13 green. Wire these cases through the actual migration sanitizer/removal path instead.

Also applies to: 440-469, 607-690
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/test-credential-sanitization.sh` around lines 213 - 279, The test is
currently validating local helpers (stripCredentials/walkAndRemoveFile) instead
of exercising the real migration code; change the node -e invocation to require
and call the actual migration sanitizer/removal functions (e.g. call
sanitizeCredentialsInBundle(bundleDir) and the bundle auth-file removal function
exported by the migration module) rather than defining stripCredentials or
walkAndRemoveFile inline; locate and import the module that exports
sanitizeCredentialsInBundle (and the auth-profiles removal function), pass the
same bundleDir used by the test, await/handle its result, and print 'SANITIZED'
only after the real functions complete so the test verifies the shipped
migration logic end-to-end.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/e2e/test-credential-sanitization.sh`:
- Around line 608-610: The test incorrectly treats keyRef as a preserved
credential field; update the CREDENTIAL_FIELDS Set in
test/e2e/test-credential-sanitization.sh (the constant CREDENTIAL_FIELDS) to
omit 'keyRef' (or remove any assertions that require keyRef to be preserved) so
the test expects keyRef to be stripped like production sanitizer; also apply the
same change to the other occurrences referenced around lines 697-734 to ensure
tests no longer fail when keyRef is intentionally removed by the sanitizer.
- Around line 73-90: sandbox_exec currently swallows exit codes from both the
`openshell sandbox ssh-config` and the `ssh` probe, so make it fail closed by
capturing and checking both exit statuses: run `openshell sandbox ssh-config`
and if it fails, remove the temp file and return a non-zero status (or echo a
sentinel like "__PROBE_FAILED__"), then run `ssh` capturing its exit code
separately (do not let `|| true` hide it); if `ssh` exits non-zero or produces
no stdout treat that as a failed probe (echo sentinel or propagate the ssh exit
code) and return non-zero so upstream checks fail. Apply the same treatment to
the other identical helper at the other location (the block around lines
395-417).

---

Duplicate comments:
In `@test/e2e/test-credential-sanitization.sh`:
- Around line 487-591: The tests currently exercise only local helpers
(verifyBlueprintDigest_FIXED, verifyBlueprintDigest_VULNERABLE, verifyDigest)
rather than NemoClaw’s real verification routine, so replace the inline helper
calls with calls into the actual verifier exported by the codebase
(require/import the module that exposes the NemoClaw blueprint verification
function, e.g. verifyBlueprint or whatever exported symbol the project uses),
remove the stub functions, and pass real manifest + blueprint content into that
verifier; update assertions to check the real verifier’s result (accepted vs
rejected) for empty/undefined, wrong, and correct digest cases so the test fails
on regressions.
- Around line 213-279: The test is currently validating local helpers
(stripCredentials/walkAndRemoveFile) instead of exercising the real migration
code; change the node -e invocation to require and call the actual migration
sanitizer/removal functions (e.g. call sanitizeCredentialsInBundle(bundleDir)
and the bundle auth-file removal function exported by the migration module)
rather than defining stripCredentials or walkAndRemoveFile inline; locate and
import the module that exports sanitizeCredentialsInBundle (and the
auth-profiles removal function), pass the same bundleDir used by the test,
await/handle its result, and print 'SANITIZED' only after the real functions
complete so the test verifies the shipped migration logic end-to-end.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b2ca588e-197f-452e-801a-3615b6806569

📥 Commits

Reviewing files that changed from the base of the PR and between 9c4ed17 and b86c294.

📒 Files selected for processing (1)

test/e2e/test-credential-sanitization.sh

test/e2e/test-credential-sanitization.sh

…nitization Adds two new Brev E2E test suites targeting the vulnerabilities fixed by PR NVIDIA#119 (Telegram bridge command injection) and PR NVIDIA#156 (credential exposure in migration snapshots + blueprint digest bypass). Test suites: - test-telegram-injection.sh: 8 tests covering command substitution, backtick injection, quote-breakout, parameter expansion, process table leaks, and SANDBOX_NAME validation - test-credential-sanitization.sh: 13 tests covering auth-profiles.json deletion, credential field stripping, non-credential preservation, symlink safety, blueprint digest verification, and pattern-based field detection These tests are expected to FAIL on main (unfixed code) and PASS once PR NVIDIA#119 and NVIDIA#156 are merged. Refs: NVIDIA#118, NVIDIA#119, NVIDIA#156, NVIDIA#813

- Add SKIP_VLLM=1 support to brev-setup.sh - Use SKIP_VLLM=1 in brev-e2e.test.js bootstrap - Bump beforeAll timeout to 30 min for CPU instances - Bump workflow timeout to 60 min for 3 test suites

- Stream SSH output to CI log during bootstrap (no more silence) - Add timestamps to brev-setup.sh and setup.sh info/warn/fail messages - Add background progress reporter during sandbox Docker build (heartbeat every 30s showing elapsed time, current Docker step, and last log line) - Stream test script output to CI log via tee + capture for assertions - Filter potential secrets from progress heartbeat output

Replace bare 'brev create' + brev-setup.sh with 'brev start' using the OpenShell-Community launch-nemoclaw.sh setup script. This installs Docker, OpenShell CLI, and Node.js via the launchable's proven path, then runs 'nemoclaw onboard --non-interactive' to build the sandbox (testing whether this path is faster than our manual setup.sh). Changes: - Default CPU back to 4x16 (8x32 didn't help — bottleneck was I/O) - Launchable path: brev start + setup-script URL, poll for completion, rsync PR branch, npm ci, nemoclaw onboard - Legacy path preserved (USE_LAUNCHABLE=0) - Timestamped logging throughout for timing comparison - New use_launchable workflow input (default: true)

… mode openshell sandbox create without a command defaults to opening an interactive shell inside the sandbox. In CI (non-interactive SSH), this hangs forever — the sandbox goes Ready but the command never returns. The [?2004h] terminal escape codes in CI logs were bash enabling bracketed paste mode, waiting for input. Add --no-tty -- true so the command exits immediately after the sandbox is created and Ready.

The launchable setup script installs Node.js via nvm, which sets up PATH in ~/.nvm/nvm.sh. Non-interactive SSH doesn't source .bashrc, so npm/node commands fail with 'command not found'. Source nvm.sh before running npm in the launchable path and runRemoteTest.

setup.sh defaulted to 'nemoclaw' ignoring the NEMOCLAW_SANDBOX_NAME env var set by the CI test harness (e2e-test). Now uses $1 > $NEMOCLAW_SANDBOX_NAME > nemoclaw.

…box) The full E2E test runs install.sh --non-interactive which destroys and rebuilds the sandbox. When TEST_SUITE=all, this kills the sandbox that beforeAll created, causing credential-sanitization and telegram-injection to fail with 'sandbox not running'. Only run full E2E when TEST_SUITE=full.

On forks or before the first base-image workflow run, the GHCR base image (ghcr.io/nvidia/nemoclaw/sandbox-base:latest) doesn't exist. This causes the Dockerfile's FROM to fail. Now setup.sh checks for the base image and builds Dockerfile.base locally if needed. On subsequent builds, Docker layer cache makes this near-instant. Once the GHCR base image is available, this becomes a no-op (docker pull succeeds and the local build is skipped).

brev-setup.sh creates the sandbox but doesn't install the host-side nemoclaw CLI that test scripts need for 'nemoclaw <name> status'. Add npm install + build + link step after bootstrap.

setup.sh creates the sandbox via openshell directly but never writes ~/.nemoclaw/sandboxes.json. The security test scripts check `nemoclaw <name> status` which reads the registry, causing all E2E runs to fail with 'Sandbox e2e-test not running'. Write the registry entry after nemoclaw CLI install so the test scripts can find the sandbox.

C7 greps for 'npm_' inside the sandbox and false-positives on nemoclaw-blueprint/policies/presets/npm.yaml which contains rule names like 'npm_yarn', not actual credentials. Filter out /policies/ paths from all three pattern checks.

Document what each test_suite option runs so maintainers can make an informed choice from the Actions UI without reading the test scripts.

Re-enable the github.repository check so the workflow only runs on NVIDIA/NemoClaw, not on forks.

…nv var setup.sh now uses ${1:-${NEMOCLAW_SANDBOX_NAME:-nemoclaw}} instead of ${1:-nemoclaw}. Update the test to match and add coverage for the env var fallback path.

…fix stdio type

…on test

…s, shell injection in test - Bump e2e-brev workflow timeout-minutes from 60 to 90 - Add fail-fast when launchable setup exceeds 40-min wait - Add pipefail to remote pipeline commands in runRemoteTest and npm ci - Fix backtick shell injection in validateName test loop (use process.argv) - Make sandbox_exec fail closed with __PROBE_FAILED__ sentinel - Add probe failure checks in C6/C7 sandbox assertions

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (1)

test/e2e/test-credential-sanitization.sh (1)

226-290: ⚠️ Potential issue | 🟠 Major

Exercise NemoClaw’s real sanitizer and digest-verification paths.

These node -e blocks still define local credential-stripping, symlink-walk, and digest-verification helpers, so the suite can pass without touching the code it is supposed to protect. The copy has already drifted from production — keyRef is omitted from CREDENTIAL_FIELDS here and C13 treats preserving it as success — which means these checks can miss real regressions and also report correct shipped behavior as broken. Please drive C1-C5/C8/C9-C13 through the actual migration-state / blueprint verification entrypoints instead of inline helpers.

Based on learnings, keyRef is intentionally included in CREDENTIAL_FIELDS in nemoclaw/src/commands/migration-state.ts because it currently holds actual secret material.

Also applies to: 455-484, 502-542, 559-606, 622-752

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/e2e/test-credential-sanitization.sh`:
- Around line 331-367: The test currently shells out to python3 to parse JSON
(variables nvidia_apikey, gateway_token, model_primary, gateway_mode), which
adds an undeclared Python dependency; replace each echo "$config_content" |
python3 -c "..." block with an echo "$config_content" | node -e "const
d=require('fs').readFileSync(0,'utf8'); try{const cfg=JSON.parse(d);
console.log(<field-expr>)}catch(e){process.exitCode=1}" pattern so Node parses
the JSON and prints the same fields (use cfg.nvidia?.apiKey ?? 'MISSING',
cfg.gateway?.auth?.token ?? 'MISSING', cfg.agents?.defaults?.model?.primary ??
'MISSING', cfg.gateway?.mode ?? 'MISSING'), preserving the existing || echo
"PARSE_ERROR" fallback behavior; update the four spots that currently set
nvidia_apikey, gateway_token, model_primary, and gateway_mode.
- Around line 75-102: The remote probe pipelines run under SSH without pipefail
so earlier stages (find/grep) can fail while the final command (head) returns 0,
producing empty output and a false "no leaks" result; update sandbox_exec to
execute the remote command under a shell with pipefail (e.g., run the provided
"$cmd" via bash -o pipefail -c ...) so any pipeline failure returns non‑zero and
causes sandbox_exec to emit the "__PROBE_FAILED__" sentinel, and apply the same
change to the other probe invocations that use pipeline commands (the other
sandbox probe blocks referenced).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 87d5e831-522e-4128-bb20-09465d31d7c6

📥 Commits

Reviewing files that changed from the base of the PR and between b86c294 and 32687e3.

📒 Files selected for processing (7)

.github/workflows/e2e-brev.yaml
scripts/brev-setup.sh
scripts/setup.sh
test/e2e/brev-e2e.test.js
test/e2e/test-credential-sanitization.sh
test/e2e/test-telegram-injection.sh
test/setup-sandbox-name.test.js

✅ Files skipped from review due to trivial changes (1)

test/e2e/brev-e2e.test.js

🚧 Files skipped from review as they are similar to previous changes (4)

scripts/brev-setup.sh
.github/workflows/e2e-brev.yaml
test/e2e/test-telegram-injection.sh
scripts/setup.sh

coderabbitai · 2026-03-30T16:30:37Z

test/e2e/test-credential-sanitization.sh

+sandbox_exec() {
+  local cmd="$1"
+  local ssh_config
+  ssh_config="$(mktemp)"
+  if ! openshell sandbox ssh-config "$SANDBOX_NAME" >"$ssh_config" 2>/dev/null; then
+    rm -f "$ssh_config"
+    echo "__PROBE_FAILED__"
+    return 1
+  fi
+
+  local result
+  local rc=0
+  result=$(timeout 60 ssh -F "$ssh_config" \
+    -o StrictHostKeyChecking=no \
+    -o UserKnownHostsFile=/dev/null \
+    -o ConnectTimeout=10 \
+    -o LogLevel=ERROR \
+    "openshell-${SANDBOX_NAME}" \
+    "$cmd" \
+    2>&1) || rc=$?
+
+  rm -f "$ssh_config"
+  if [ "$rc" -ne 0 ] && [ -z "$result" ]; then
+    echo "__PROBE_FAILED__"
+    return 1
+  fi
+  echo "$result"
+}


⚠️ Potential issue | 🟠 Major

C6/C7 can still pass when the remote probe errors out.

The commands sent through sandbox_exec() are pipelines with stderr discarded (find ... | head, grep ... | ... | head). If find/grep fails because a target path is missing or unreadable, head still exits 0, sandbox_exec() sees empty output, and the check reports “no leaks” even though nothing was inspected. Run these probes under remote pipefail or return an explicit sentinel when any stage of the pipeline fails.

Also applies to: 408-425, 427-433

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@test/e2e/test-credential-sanitization.sh` around lines 75 - 102, The remote probe pipelines run under SSH without pipefail so earlier stages (find/grep) can fail while the final command (head) returns 0, producing empty output and a false "no leaks" result; update sandbox_exec to execute the remote command under a shell with pipefail (e.g., run the provided "$cmd" via bash -o pipefail -c ...) so any pipeline failure returns non‑zero and causes sandbox_exec to emit the "__PROBE_FAILED__" sentinel, and apply the same change to the other probe invocations that use pipeline commands (the other sandbox probe blocks referenced).

coderabbitai · 2026-03-30T16:30:37Z

test/e2e/test-credential-sanitization.sh

+nvidia_apikey=$(echo "$config_content" | python3 -c "
+import json, sys
+config = json.load(sys.stdin)
+print(config.get('nvidia', {}).get('apiKey', 'MISSING'))
+" 2>/dev/null || echo "PARSE_ERROR")
+
+gateway_token=$(echo "$config_content" | python3 -c "
+import json, sys
+config = json.load(sys.stdin)
+print(config.get('gateway', {}).get('auth', {}).get('token', 'MISSING'))
+" 2>/dev/null || echo "PARSE_ERROR")
+
+if [ "$nvidia_apikey" = "[STRIPPED_BY_MIGRATION]" ]; then
+  pass "C3a: nvidia.apiKey replaced with sentinel"
+else
+  fail "C3a: nvidia.apiKey not sanitized (got: $nvidia_apikey)"
+fi
+
+if [ "$gateway_token" = "[STRIPPED_BY_MIGRATION]" ]; then
+  pass "C3b: gateway.auth.token replaced with sentinel"
+else
+  fail "C3b: gateway.auth.token not sanitized (got: $gateway_token)"
+fi
+
+# C4: Non-credential fields must be preserved
+info "C4: Checking non-credential field preservation..."
+model_primary=$(echo "$config_content" | python3 -c "
+import json, sys
+config = json.load(sys.stdin)
+print(config.get('agents', {}).get('defaults', {}).get('model', {}).get('primary', 'MISSING'))
+" 2>/dev/null || echo "PARSE_ERROR")
+
+gateway_mode=$(echo "$config_content" | python3 -c "
+import json, sys
+config = json.load(sys.stdin)
+print(config.get('gateway', {}).get('mode', 'MISSING'))
+" 2>/dev/null || echo "PARSE_ERROR")


⚠️ Potential issue | 🟡 Minor

C3/C4 add an undeclared Python dependency.

These assertions shell out to python3, but Phase 0 never checks for it and the rest of the script already standardizes on Node. On images without Python 3, this suite will fail with PARSE_ERROR for test-infrastructure reasons rather than sanitization regressions. Either add a prerequisite check or do these JSON reads with node.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@test/e2e/test-credential-sanitization.sh` around lines 331 - 367, The test currently shells out to python3 to parse JSON (variables nvidia_apikey, gateway_token, model_primary, gateway_mode), which adds an undeclared Python dependency; replace each echo "$config_content" | python3 -c "..." block with an echo "$config_content" | node -e "const d=require('fs').readFileSync(0,'utf8'); try{const cfg=JSON.parse(d); console.log(<field-expr>)}catch(e){process.exitCode=1}" pattern so Node parses the JSON and prints the same fields (use cfg.nvidia?.apiKey ?? 'MISSING', cfg.gateway?.auth?.token ?? 'MISSING', cfg.agents?.defaults?.model?.primary ?? 'MISSING', cfg.gateway?.mode ?? 'MISSING'), preserving the existing || echo "PARSE_ERROR" fallback behavior; update the four spots that currently set nvidia_apikey, gateway_token, model_primary, and gateway_mode.

cv

Thanks for pushing this — the direction is good, but I don't think this is mergeable as-is.

Main issues I verified:

test/e2e/test-credential-sanitization.sh reimplements stripCredentials, walkAndRemoveFile, and digest verification inline instead of invoking the real code in nemoclaw/src/commands/migration-state.ts. That makes large parts of the suite self-fulfilling rather than true end-to-end coverage.
The credential suite does not appear to drive the real snapshot/restore path end-to-end, so it is not yet validating the production migration flow it claims to cover.
test/e2e/test-telegram-injection.sh does not exercise scripts/telegram-bridge.js / runAgentInSandbox(). It uses ad-hoc SSH commands like MSG=$(cat) && echo ..., which bypass the actual quoting/session-handling path we need confidence in.
test/e2e/brev-e2e.test.js defaults to downloading/executing a raw setup script from a feature branch in OpenShell-Community. Please pin that to an immutable commit SHA or vendor it into this repo.

I think this is worth landing once the tests are wired to the real production paths and the launchable bootstrap is pinned. Also please rebase after #1090 / current main, since this PR currently carries the setup script changes too.

cv

Talked to @jyaunches offline and we're okay with the changes I requested being done in follow-up PRs

coderabbitai bot reviewed Mar 30, 2026

View reviewed changes

jyaunches mentioned this pull request Mar 30, 2026

Add Brev E2E tests for command injection and credential sanitization #1097

Closed

coderabbitai bot reviewed Mar 30, 2026

View reviewed changes

test/e2e/test-telegram-injection.sh Show resolved Hide resolved

coderabbitai bot reviewed Mar 30, 2026

View reviewed changes

test/e2e/test-credential-sanitization.sh Show resolved Hide resolved

test/e2e/test-credential-sanitization.sh Show resolved Hide resolved

jyaunches added 25 commits March 30, 2026 12:11

ci: temporarily disable repo guard for fork testing

3ca3da0

ci: bump bootstrap timeout, skip vLLM on CPU E2E runs

720b16f

- Add SKIP_VLLM=1 support to brev-setup.sh - Use SKIP_VLLM=1 in brev-e2e.test.js bootstrap - Bump beforeAll timeout to 30 min for CPU instances - Bump workflow timeout to 60 min for 3 test suites

ci: bump bootstrap timeout to 40 min for sandbox image build

3626cee

ci: bump Brev instance to 8x32 for faster Docker builds

1e40af1

fix: setup.sh respects NEMOCLAW_SANDBOX_NAME env var

fc9229a

setup.sh defaulted to 'nemoclaw' ignoring the NEMOCLAW_SANDBOX_NAME env var set by the CI test harness (e2e-test). Now uses $1 > $NEMOCLAW_SANDBOX_NAME > nemoclaw.

ci: bump full E2E test timeout to 15 min for install + sandbox build

8704eaf

ci: install nemoclaw CLI after bootstrap in non-launchable path

f13e81f

brev-setup.sh creates the sandbox but doesn't install the host-side nemoclaw CLI that test scripts need for 'nemoclaw <name> status'. Add npm install + build + link step after bootstrap.

fix: use npm_config_prefix for nemoclaw CLI install so it lands on PATH

8393d8a

fix: npm link from repo root where bin.nemoclaw is defined

8335ba9

style: shfmt formatting fix in setup.sh

50ca58f

docs(ci): add test suite descriptions to e2e-brev workflow header

2271a06

Document what each test_suite option runs so maintainers can make an informed choice from the Actions UI without reading the test scripts.

ci: re-enable repo guard for e2e-brev workflow

73ab4f1

Re-enable the github.repository check so the workflow only runs on NVIDIA/NemoClaw, not on forks.

fix(test): update setup-sandbox-name test for NEMOCLAW_SANDBOX_NAME e…

6dc2493

…nv var setup.sh now uses ${1:-${NEMOCLAW_SANDBOX_NAME:-nemoclaw}} instead of ${1:-nemoclaw}. Update the test to match and add coverage for the env var fallback path.

fix(lint): add shellcheck directives for injection test payloads and …

7f04a9b

…fix stdio type

fix(lint): suppress SC2034 for status_output in credential sanitizati…

5308e74

…on test

jyaunches force-pushed the feat/security-e2e-tests branch from b86c294 to 32687e3 Compare March 30, 2026 16:19

coderabbitai bot reviewed Mar 30, 2026

View reviewed changes

cv requested changes Mar 30, 2026

View reviewed changes

Merge branch 'main' into feat/security-e2e-tests

87996cc

cv approved these changes Mar 30, 2026

View reviewed changes

cv merged commit bd5425c into NVIDIA:main Mar 30, 2026
8 checks passed

jyaunches mentioned this pull request Mar 30, 2026

refactor(test): wire E2E security tests to real production code paths #1107

Open

wscurran added Platform: Brev Support for Brev deployment security Something isn't secure CI/CD Use this label to identify issues with NemoClaw CI/CD pipeline or GitHub Actions. labels Mar 30, 2026

Conversation

jyaunches commented Mar 30, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What already existed in the repo

What this adds

New files

Modified files

Key design decisions

What needs to be in the repo to run this

Secrets (configured by repo admin)

Infrastructure

How a maintainer uses this

Running from the Actions UI

Calling from another workflow

SSH debugging (keep_alive: true)

Adding new test suites

Files changed

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related issues

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

cv left a comment

Choose a reason for hiding this comment

Uh oh!

cv left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jyaunches commented Mar 30, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 30, 2026 •

edited

Loading