Skip to content

ci(nightly-e2e): remove misleading environment from cloud-experimental job#1085

Merged
cv merged 8 commits intoNVIDIA:mainfrom
mercl-lau:e2e-test-20260325
Mar 30, 2026
Merged

ci(nightly-e2e): remove misleading environment from cloud-experimental job#1085
cv merged 8 commits intoNVIDIA:mainfrom
mercl-lau:e2e-test-20260325

Conversation

@mercl-lau
Copy link
Copy Markdown
Contributor

@mercl-lau mercl-lau commented Mar 30, 2026

Summary

Remove the environment: NVIDIA_API_KEY binding from the cloud-experimental-e2e job in the nightly workflow. GitHub treats environment as a deployment target, which created a confusing Deployments page whose name looked like a secret. The job already receives the API key via env and repository secret secrets.NVIDIA_API_KEY, matching the cloud-e2e job.
Also default the cloud-experimental Phase 3 install log path to include ${UID} so root and non-root runs do not share the same file.

Related Issue

N/A (internal CI clarity / follow-up from team discussion).

Changes

  • .github/workflows/nightly-e2e.yaml: drop environment: NVIDIA_API_KEY from cloud-experimental-e2e.
  • test/e2e/test-e2e-cloud-experimental.sh: default INSTALL_LOG uses /tmp/nemoclaw-e2e-cloud-experimental-install.${UID}.log; comment updated accordingly.

Type of Change

  • Code change for a new feature, bug fix, or refactor.
  • Code change with doc updates.
  • Doc only. Prose changes without code sample modifications.
  • Doc only. Includes code sample changes.

Testing

  • npx prek run --all-files passes (or equivalently make check).
  • npm test passes.
  • make docs builds without warnings. (for doc-only changes)

Checklist

General

Code Changes

  • Formatters applied — npx prek run --all-files auto-fixes formatting (or make format for targeted runs).
  • Tests added or updated for new or changed behavior.
  • No secrets, API keys, or credentials committed.
  • Doc pages updated for any user-facing behavior changes (new commands, changed defaults, new features, bug fixes that contradict existing docs).

Doc Changes

N/A — no doc-only section required for this PR.

Summary by CodeRabbit

  • New Features

    • Added a Spark install smoke test and a reusable teardown/verify cleanup step for experimental e2e runs.
    • Added phase-tag controls to run or skip specific e2e phases.
  • Tests

    • Enhanced cloud-experimental e2e: configurable provider matching, ordered egress probes (PyPI/npm/Hugging Face), proxy and skip flags, and refined timeouts.
    • Docs-link probing: URL ignore lists, normalization, and clearer progress/reporting.
    • Increased test timeouts for a couple of unit tests.
  • Chores

    • CI workflow: extended job timeout, adjusted artifact naming, and unconditional final cleanup/check steps.

…ental job

Avoid spurious deployment records; secrets still come from secrets.NVIDIA_API_KEY.

Made-with: Cursor
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 30, 2026

📝 Walkthrough

Walkthrough

E2E workflow and multiple test scripts were updated: added phase-tag gating, delegated cleanup to a new teardown script, improved docs-link skipping, expanded network-policy probes, made inference-provider checks configurable, added a Spark install smoke test, and increased a few test timeouts; workflow now skips embedded cleanup/docs and runs always-on teardown steps.

Changes

Cohort / File(s) Summary
Workflow
.github/workflows/nightly-e2e.yaml
Removed job-level environment: NVIDIA_API_KEY; increased timeout-minutes 45→90; main run sets RUN_E2E_CLOUD_EXPERIMENTAL_SKIP_FINAL_CLEANUP=1 and RUN_E2E_CLOUD_EXPERIMENTAL_SKIP_CHECK_DOCS=1; added three if: always() post-steps (check-docs.sh, skip/05-network-policy.sh, cleanup.sh --verify); changed experimental install log artifact path to /tmp/nemoclaw-e2e-cloud-experimental-install.log.
E2E controller
test/e2e/test-e2e-cloud-experimental.sh
Added e2e_cloud_experimental_phase_enabled(); introduced E2E_CLOUD_EXPERIMENTAL_ONLY_TAGS/_SKIP_TAGS gating; most phases now conditionally skip with tag-aware messages; replaced direct pre/post teardown with calls to new cleanup.sh (Phase0 no-verify, Phase6 with --verify).
Cleanup script
test/e2e/e2e-cloud-experimental/cleanup.sh
New teardown CLI: resolves SANDBOX_NAME, supports --verify; best-effort destroy via nemoclaw/openshell (non-fatal); --verify asserts sandbox/gateway absence when tools present; exits 0 on destroy or successful verify, 1 on verify failure.
Docs checker
test/e2e/e2e-cloud-experimental/check-docs.sh
Added ignore controls CHECK_DOC_LINKS_IGNORE_EXTRA and CHECK_DOC_LINKS_IGNORE_URL_REGEX; URL normalization helper and built-in ignore list; remote-phase builds filtered probe list, logs skipped URLs, and reports counts for unique probed URLs.
Network probes
test/e2e/e2e-cloud-experimental/skip/05-network-policy.sh
Replaced single whitelist loop with ordered per-domain probes (PyPI, npm, Hugging Face, blocked URL); added NEMOCLAW_E2E_CURL_NOPROXY and NEMOCLAW_E2E_SKIP_NETWORK_POLICY_HUGGINGFACE; improved timeout handling and proxy-related exit-code messaging; final output reflects whether Hugging Face ran or was skipped.
Provider-configurable skips
test/e2e/e2e-cloud-experimental/skip/01-onboard-completion.sh, test/e2e/e2e-cloud-experimental/skip/04-nemoclaw-openshell-status-parity.sh
Replaced hardcoded nvidia-nim checks with CLOUD_EXPERIMENTAL_INFERENCE_PROVIDER (default nvidia-nim); updated grep matching and failure messaging to reference configured provider.
Spark smoke test
test/e2e/test-spark-install.sh
Added new end-to-end Spark install smoke test (strict mode, phased checks: OS, Docker, optional setup, install via public/local route); streams install log with tailing, fails with diagnostic excerpt on error, verifies nemoclaw/openshell availability.
Test timeouts (JS)
test/install-preflight.test.js, test/uninstall.test.js
Added 60,000ms (60s) timeout to two it(...) tests.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant GitHubWorkflow as GitHub Workflow
participant Runner as Runner
participant E2E_Script as test-e2e-cloud-experimental.sh
participant Cleanup as cleanup.sh
participant OpenTools as openshell / nemoclaw
GitHubWorkflow->>Runner: start job (timeout 90m)
Runner->>E2E_Script: run main e2e with SKIP_FINAL_CLEANUP=1, SKIP_CHECK_DOCS=1
E2E_Script->>Runner: execute enabled phases (tag-gated)
Runner->>GitHubWorkflow: main step completes
GitHubWorkflow->>Runner: always-run post-step: check-docs.sh
GitHubWorkflow->>Runner: always-run post-step: skip/05-network-policy.sh
GitHubWorkflow->>Cleanup: always-run post-step: cleanup.sh --verify
Cleanup->>OpenTools: call openshell/nemoclaw cleanup and verification
OpenTools-->>Cleanup: report success/failure
Cleanup-->>GitHubWorkflow: exit code affects workflow status

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I nibbled logs and hopped through phases bright,

Skip flags tucked in pockets, cleanup done at night,
Docs left unprobed where gentle skips were planned,
Sandboxes swept tidy by a careful paw and hand,
Hooray — the nightly run hops home, all right. 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 15.38% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the primary change: removing the misleading environment declaration from the cloud-experimental job in the nightly workflow.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/e2e/test-e2e-cloud-experimental.sh`:
- Line 293: The workflow still uploads the old install log filename, so update
the artifact upload step to use the new INSTALL_LOG name; locate the INSTALL_LOG
variable in test/e2e/test-e2e-cloud-experimental.sh and change the corresponding
artifact path/filename in the workflow upload step (the job that uploads the
install log artifact) to /tmp/nemoclaw-e2e-cloud-experimental-install.${UID}.log
(or reference ${{ env.INSTALL_LOG }} if the workflow imports that env) so CI
preserves the correct install log on failure.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 1e52d9c7-42f3-4d45-9c1a-465a5a9bab46

📥 Commits

Reviewing files that changed from the base of the PR and between f59f58e and d2aeb85.

📒 Files selected for processing (2)
  • .github/workflows/nightly-e2e.yaml
  • test/e2e/test-e2e-cloud-experimental.sh
💤 Files with no reviewable changes (1)
  • .github/workflows/nightly-e2e.yaml

mercl-lau and others added 2 commits March 30, 2026 13:57
… log path

Upload /tmp/nemoclaw-e2e-cloud-experimental-install.log on failure; script uses
the same fixed default without UID.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
test/uninstall.test.js (1)

53-62: ⚠️ Potential issue | 🟠 Major

Add timeout protection directly to spawnSync.

Vitest's 60_000 test timeout can't interrupt spawnSync since it blocks the event loop. If the uninstall script hangs, the test suite will hang indefinitely. Apply the timeout to spawnSync itself instead.

Suggested fix
       const result = spawnSync("bash", [UNINSTALL_SCRIPT, "--yes"], {
         cwd: path.join(import.meta.dirname, ".."),
         encoding: "utf-8",
+        timeout: 60_000,
+        killSignal: "SIGKILL",
         env: {
           ...process.env,
           HOME: tmp,
           PATH: `${fakeBin}:/usr/bin:/bin`,
           SCRIPT_DIR: path.join(import.meta.dirname, ".."),

Also applies to: 72-72

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/uninstall.test.js` around lines 53 - 62, The test calls
spawnSync("bash", [UNINSTALL_SCRIPT, "--yes"], {...}) which can block forever;
add a timeout option to the spawnSync options (e.g. timeout: 60_000) so the
child will be killed if it hangs, and also include a killSignal if desired (e.g.
killSignal: "SIGKILL") to ensure termination; update both spawnSync invocations
(the one using UNINSTALL_SCRIPT and the similar call at the later occurrence) to
include the timeout (and optional killSignal) in their options object.
test/install-preflight.test.js (1)

195-206: ⚠️ Potential issue | 🟠 Major

Add timeout and killSignal to the spawnSync call to enforce the test's 60-second boundary.

Vitest's per-test timeout cannot preempt a blocking spawnSync call. If the bash installer hangs, the event loop remains blocked and the test runner timeout becomes ineffective. The child process timeout must be set explicitly on spawnSync.

Suggested fix
     const result = spawnSync("bash", [INSTALLER], {
       cwd: tmp,
       encoding: "utf-8",
+      timeout: 60_000,
+      killSignal: "SIGKILL",
       env: {
         ...process.env,
         HOME: tmp,
         PATH: `${fakeBin}:${TEST_SYSTEM_PATH}`,

Also applies to: 210-210

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/install-preflight.test.js` around lines 195 - 206, The spawnSync
invocation that runs the installer in test/install-preflight.test.js needs
explicit child process timeouts: add timeout: 60000 and killSignal: 'SIGKILL'
(or 'SIGTERM') to the options object passed to spawnSync (the call using
INSTALLER, cwd: tmp, env including HOME, PATH, NEMOCLAW_NON_INTERACTIVE,
NPM_PREFIX, GIT_LOG_PATH). Update both occurrences of the spawnSync call (the
one around the current block and the second occurrence noted) so the synchronous
child will be forcibly killed after 60 seconds instead of blocking the test
runner.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/e2e/e2e-cloud-experimental/cleanup.sh`:
- Around line 50-55: The current verification treats any nonzero exit from
"openshell sandbox get" and skipped failures from "nemoclaw list" as proof the
sandbox is gone; change both checks to positively detect a "not found" response
and otherwise fail: run "openshell sandbox get $SANDBOX_NAME" capturing
stdout/stderr and exit code, and only treat it as success when the output (or
stderr) contains a canonical not-found string (e.g. "not found", "No such
sandbox", or the provider-specific message) — on any other nonzero exit or
ambiguous output, call fail/exit with an error; likewise run "nemoclaw list" and
parse its successful output for SANDBOX_NAME (treat absence as not found) but if
the list command itself errors or returns ambiguous output, treat that as a test
failure rather than success.
- Around line 28-31: The script currently treats any argument other than the
exact string "--verify" as no-ops and silently leaves VERIFY=0; change the
argument handling to explicitly reject unexpected args: check the positional
parameters (e.g., test "$#" and "$1") and if an argument is present and not
equal to "--verify" print an error to stderr and exit non-zero, otherwise set
VERIFY=1 only when "$1" == "--verify"; update the logic around the VERIFY
variable and the if condition that reads "$1" so typos or extra args cause a
failing exit instead of silently downgrading to destroy-only.

In `@test/e2e/e2e-cloud-experimental/skip/05-network-policy.sh`:
- Around line 349-355: Section A's YAML assertions need to include explicit
checks for Hugging Face hosts because the script now enforces Hugging Face
egress by default; update the assertions that currently only verify npm/PyPI
entries to also assert the expected Hugging Face hub/CDN entries are present in
policy_output. Specifically, add assertions (guarded by SKIP_HUGGINGFACE similar
to run_whitelist_huggingface_via_cli) that policy_output contains the Hugging
Face hostnames such as huggingface.co and the CDN/LFS domains used by the hub
(e.g., cdn.huggingface.co / cdn-lfs.huggingface.co or your project's canonical
HF hostnames) so the test fails if those egress rules are not explicitly
declared. Ensure these checks live alongside the existing npm/PyPI checks and
reference the same policy_output variable used in the script.

In `@test/e2e/test-spark-install.sh`:
- Around line 111-125: Wrap the background installer invocation with a bounded
timeout (use timeout or gtimeout) instead of starting curl|bash or install.sh
directly so wait "$install_pid" cannot hang; e.g., use a configurable
INSTALL_TIMEOUT env var and run either "timeout $INSTALL_TIMEOUT curl -fsSL
\"$url\" | bash" or "timeout $INSTALL_TIMEOUT bash install.sh --non-interactive"
(preserving INSTALL_LOG redirection and backgrounding), keep the tail logic for
INSTALL_LOG/tail_pid, capture the timeout exit code into install_exit, and
ensure you kill/wait the tail like existing logic; update references to
NEMOCLAW_E2E_PUBLIC_INSTALL, url, INSTALL_LOG, install_pid, tail_pid
accordingly.
- Around line 57-58: The INSTALL_LOG default uses a fixed /tmp path which can
collide across runs; update the default in test-spark-install.sh so INSTALL_LOG
is unique per process (e.g., use mktemp or include the shell $UID/$PPID/PID in
the filename) and ensure SETUP_SCRIPT and any subsequent code that tails or
writes INSTALL_LOG continue to reference the new variable name; replace the
fixed "/tmp/nemoclaw-e2e-spark-install.log" with a dynamically generated name
(mktemp or "/tmp/nemoclaw-e2e-spark-install.${UID:-$$}.log") so concurrent or
root/non-root runs won't clobber each other.
- Around line 96-107: The setup step can hang waiting for a sudo password;
update the invocation that runs the setup script in test-spark-install.sh to use
non-interactive sudo by adding the -n flag (i.e., replace the plain sudo call
that runs "$SETUP_SCRIPT" with sudo -n) so it fails fast when no password is
available; keep the existing pass/fail handling for
NEMOCLAW_E2E_SPARK_SKIP_SETUP and the SETUP_SCRIPT variable unchanged.

---

Outside diff comments:
In `@test/install-preflight.test.js`:
- Around line 195-206: The spawnSync invocation that runs the installer in
test/install-preflight.test.js needs explicit child process timeouts: add
timeout: 60000 and killSignal: 'SIGKILL' (or 'SIGTERM') to the options object
passed to spawnSync (the call using INSTALLER, cwd: tmp, env including HOME,
PATH, NEMOCLAW_NON_INTERACTIVE, NPM_PREFIX, GIT_LOG_PATH). Update both
occurrences of the spawnSync call (the one around the current block and the
second occurrence noted) so the synchronous child will be forcibly killed after
60 seconds instead of blocking the test runner.

In `@test/uninstall.test.js`:
- Around line 53-62: The test calls spawnSync("bash", [UNINSTALL_SCRIPT,
"--yes"], {...}) which can block forever; add a timeout option to the spawnSync
options (e.g. timeout: 60_000) so the child will be killed if it hangs, and also
include a killSignal if desired (e.g. killSignal: "SIGKILL") to ensure
termination; update both spawnSync invocations (the one using UNINSTALL_SCRIPT
and the similar call at the later occurrence) to include the timeout (and
optional killSignal) in their options object.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a82b76c5-5ee8-4b3e-a01f-ccbc6a9fd7c2

📥 Commits

Reviewing files that changed from the base of the PR and between f82f048 and 89988a5.

📒 Files selected for processing (10)
  • .github/workflows/nightly-e2e.yaml
  • test/e2e/e2e-cloud-experimental/check-docs.sh
  • test/e2e/e2e-cloud-experimental/cleanup.sh
  • test/e2e/e2e-cloud-experimental/skip/01-onboard-completion.sh
  • test/e2e/e2e-cloud-experimental/skip/04-nemoclaw-openshell-status-parity.sh
  • test/e2e/e2e-cloud-experimental/skip/05-network-policy.sh
  • test/e2e/test-e2e-cloud-experimental.sh
  • test/e2e/test-spark-install.sh
  • test/install-preflight.test.js
  • test/uninstall.test.js
🚧 Files skipped from review as they are similar to previous changes (1)
  • .github/workflows/nightly-e2e.yaml

Comment on lines +28 to +31
VERIFY=0
if [ "${1:-}" = "--verify" ]; then
VERIFY=1
fi
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Reject unexpected args instead of silently downgrading to destroy-only.

Anything other than exact --verify currently falls through to the success path. A typo here disables the verification gate without failing CI.

Suggested fix
 VERIFY=0
-if [ "${1:-}" = "--verify" ]; then
-  VERIFY=1
-fi
+case "${1:-}" in
+  "")
+    ;;
+  --verify)
+    VERIFY=1
+    ;;
+  *)
+    printf '%s\n' "cleanup.sh: unknown option: ${1}" >&2
+    exit 2
+    ;;
+esac
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
VERIFY=0
if [ "${1:-}" = "--verify" ]; then
VERIFY=1
fi
VERIFY=0
case "${1:-}" in
"")
;;
--verify)
VERIFY=1
;;
*)
printf '%s\n' "cleanup.sh: unknown option: ${1}" >&2
exit 2
;;
esac
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/e2e-cloud-experimental/cleanup.sh` around lines 28 - 31, The script
currently treats any argument other than the exact string "--verify" as no-ops
and silently leaves VERIFY=0; change the argument handling to explicitly reject
unexpected args: check the positional parameters (e.g., test "$#" and "$1") and
if an argument is present and not equal to "--verify" print an error to stderr
and exit non-zero, otherwise set VERIFY=1 only when "$1" == "--verify"; update
the logic around the VERIFY variable and the if condition that reads "$1" so
typos or extra args cause a failing exit instead of silently downgrading to
destroy-only.

Comment on lines +50 to +55
if command -v openshell >/dev/null 2>&1; then
if openshell sandbox get "$SANDBOX_NAME" >/dev/null 2>&1; then
fail "openshell sandbox get '${SANDBOX_NAME}' still succeeds after cleanup"
exit 1
fi
pass "openshell: sandbox '${SANDBOX_NAME}' no longer visible to sandbox get"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't treat CLI failures as proof the sandbox is gone.

openshell sandbox get can fail for auth/API/runtime reasons, and nemoclaw list failures are skipped entirely. In --verify mode that can report green while the sandbox still exists. Please fail unless you can positively identify a real “not found” result.

Also applies to: 60-73

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/e2e-cloud-experimental/cleanup.sh` around lines 50 - 55, The current
verification treats any nonzero exit from "openshell sandbox get" and skipped
failures from "nemoclaw list" as proof the sandbox is gone; change both checks
to positively detect a "not found" response and otherwise fail: run "openshell
sandbox get $SANDBOX_NAME" capturing stdout/stderr and exit code, and only treat
it as success when the output (or stderr) contains a canonical not-found string
(e.g. "not found", "No such sandbox", or the provider-specific message) — on any
other nonzero exit or ambiguous output, call fail/exit with an error; likewise
run "nemoclaw list" and parse its successful output for SANDBOX_NAME (treat
absence as not found) but if the list command itself errors or returns ambiguous
output, treat that as a test failure rather than success.

Comment on lines +349 to +355
run_whitelist_pypi_via_venv
run_whitelist_npm_via_cli
if [[ "$SKIP_HUGGINGFACE" == "1" ]]; then
printf '%s\n' "05-network-policy: SKIP huggingface whitelist (NEMOCLAW_E2E_SKIP_NETWORK_POLICY_HUGGINGFACE=1)"
else
run_whitelist_huggingface_via_cli
fi
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Mirror the new Hugging Face probe in the YAML assertions.

This script now enforces Hugging Face egress by default, but section A still only proves npm/PyPI hosts are declared in policy_output. That means this can pass via an overly broad rule and still claim the declared policy is correct. Please add explicit checks for the expected Hugging Face hub/CDN entries too.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/e2e-cloud-experimental/skip/05-network-policy.sh` around lines 349 -
355, Section A's YAML assertions need to include explicit checks for Hugging
Face hosts because the script now enforces Hugging Face egress by default;
update the assertions that currently only verify npm/PyPI entries to also assert
the expected Hugging Face hub/CDN entries are present in policy_output.
Specifically, add assertions (guarded by SKIP_HUGGINGFACE similar to
run_whitelist_huggingface_via_cli) that policy_output contains the Hugging Face
hostnames such as huggingface.co and the CDN/LFS domains used by the hub (e.g.,
cdn.huggingface.co / cdn-lfs.huggingface.co or your project's canonical HF
hostnames) so the test fails if those egress rules are not explicitly declared.
Ensure these checks live alongside the existing npm/PyPI checks and reference
the same policy_output variable used in the script.

Comment on lines +57 to +58
SETUP_SCRIPT="$REPO/scripts/setup-spark.sh"
INSTALL_LOG="${INSTALL_LOG:-/tmp/nemoclaw-e2e-spark-install.log}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Default INSTALL_LOG can collide across runs.

The fixed /tmp/nemoclaw-e2e-spark-install.log path is shared by root/non-root and concurrent jobs, so another run can clobber the file you're tailing. Using $UID or mktemp would avoid the same collision class this PR already fixed for the cloud-experimental installer log.

Suggested fix
-INSTALL_LOG="${INSTALL_LOG:-/tmp/nemoclaw-e2e-spark-install.log}"
+INSTALL_LOG="${INSTALL_LOG:-/tmp/nemoclaw-e2e-spark-install.${UID}.log}"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
SETUP_SCRIPT="$REPO/scripts/setup-spark.sh"
INSTALL_LOG="${INSTALL_LOG:-/tmp/nemoclaw-e2e-spark-install.log}"
SETUP_SCRIPT="$REPO/scripts/setup-spark.sh"
INSTALL_LOG="${INSTALL_LOG:-/tmp/nemoclaw-e2e-spark-install.${UID}.log}"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/test-spark-install.sh` around lines 57 - 58, The INSTALL_LOG default
uses a fixed /tmp path which can collide across runs; update the default in
test-spark-install.sh so INSTALL_LOG is unique per process (e.g., use mktemp or
include the shell $UID/$PPID/PID in the filename) and ensure SETUP_SCRIPT and
any subsequent code that tails or writes INSTALL_LOG continue to reference the
new variable name; replace the fixed "/tmp/nemoclaw-e2e-spark-install.log" with
a dynamically generated name (mktemp or
"/tmp/nemoclaw-e2e-spark-install.${UID:-$$}.log") so concurrent or root/non-root
runs won't clobber each other.

Comment on lines +96 to +107
if [ "${NEMOCLAW_E2E_SPARK_SKIP_SETUP:-0}" = "1" ]; then
info "Skipping sudo setup-spark (NEMOCLAW_E2E_SPARK_SKIP_SETUP=1)"
pass "setup-spark skipped"
else
info "Running: sudo bash scripts/setup-spark.sh"
if sudo bash "$SETUP_SCRIPT"; then
pass "setup-spark completed"
else
fail "setup-spark failed"
exit 1
fi
fi
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Make the setup step fail fast with sudo -n.

This smoke test is explicitly non-interactive, but plain sudo can block on a password/TTY prompt. In CI that turns into a hang instead of a clear prerequisite failure.

Suggested fix
-  info "Running: sudo bash scripts/setup-spark.sh"
-  if sudo bash "$SETUP_SCRIPT"; then
+  info "Running: sudo -n bash scripts/setup-spark.sh"
+  if sudo -n bash "$SETUP_SCRIPT"; then
     pass "setup-spark completed"
   else
-    fail "setup-spark failed"
+    fail "setup-spark failed (passwordless sudo is required for this smoke test)"
     exit 1
   fi
 fi
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if [ "${NEMOCLAW_E2E_SPARK_SKIP_SETUP:-0}" = "1" ]; then
info "Skipping sudo setup-spark (NEMOCLAW_E2E_SPARK_SKIP_SETUP=1)"
pass "setup-spark skipped"
else
info "Running: sudo bash scripts/setup-spark.sh"
if sudo bash "$SETUP_SCRIPT"; then
pass "setup-spark completed"
else
fail "setup-spark failed"
exit 1
fi
fi
if [ "${NEMOCLAW_E2E_SPARK_SKIP_SETUP:-0}" = "1" ]; then
info "Skipping sudo setup-spark (NEMOCLAW_E2E_SPARK_SKIP_SETUP=1)"
pass "setup-spark skipped"
else
info "Running: sudo -n bash scripts/setup-spark.sh"
if sudo -n bash "$SETUP_SCRIPT"; then
pass "setup-spark completed"
else
fail "setup-spark failed (passwordless sudo is required for this smoke test)"
exit 1
fi
fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/test-spark-install.sh` around lines 96 - 107, The setup step can
hang waiting for a sudo password; update the invocation that runs the setup
script in test-spark-install.sh to use non-interactive sudo by adding the -n
flag (i.e., replace the plain sudo call that runs "$SETUP_SCRIPT" with sudo -n)
so it fails fast when no password is available; keep the existing pass/fail
handling for NEMOCLAW_E2E_SPARK_SKIP_SETUP and the SETUP_SCRIPT variable
unchanged.

Comment on lines +111 to +125
if [ "${NEMOCLAW_E2E_PUBLIC_INSTALL:-0}" = "1" ]; then
url="${NEMOCLAW_INSTALL_SCRIPT_URL:-https://www.nvidia.com/nemoclaw.sh}"
info "Running: curl -fsSL ... | bash (url=$url)"
curl -fsSL "$url" | bash >"$INSTALL_LOG" 2>&1 &
else
info "Running: bash install.sh --non-interactive"
bash install.sh --non-interactive >"$INSTALL_LOG" 2>&1 &
fi
install_pid=$!
tail -f "$INSTALL_LOG" --pid=$install_pid 2>/dev/null &
tail_pid=$!
wait "$install_pid"
install_exit=$?
kill "$tail_pid" 2>/dev/null || true
wait "$tail_pid" 2>/dev/null || true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Bound the background installer with a real timeout.

wait "$install_pid" will sit until the job-level timeout if curl or install.sh stalls. Please wrap the install command in timeout/gtimeout so this phase fails with a log excerpt instead of hanging the runner.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/test-spark-install.sh` around lines 111 - 125, Wrap the background
installer invocation with a bounded timeout (use timeout or gtimeout) instead of
starting curl|bash or install.sh directly so wait "$install_pid" cannot hang;
e.g., use a configurable INSTALL_TIMEOUT env var and run either "timeout
$INSTALL_TIMEOUT curl -fsSL \"$url\" | bash" or "timeout $INSTALL_TIMEOUT bash
install.sh --non-interactive" (preserving INSTALL_LOG redirection and
backgrounding), keep the tail logic for INSTALL_LOG/tail_pid, capture the
timeout exit code into install_exit, and ensure you kill/wait the tail like
existing logic; update references to NEMOCLAW_E2E_PUBLIC_INSTALL, url,
INSTALL_LOG, install_pid, tail_pid accordingly.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/e2e/e2e-cloud-experimental/skip/05-network-policy.sh`:
- Around line 185-186: The pip probe currently can hit a warmed cache; modify
the python -m pip download invocation (the command that downloads idna==3.7) to
add the --no-cache-dir flag so it cannot use ~/.cache/pip, and for the Hugging
Face probe block (the HF download commands) set HF_HOME to a fresh temporary
directory (e.g., create and export a unique temp dir via mktemp -d) immediately
before those HF download commands and clean it up afterward so HF tools use an
empty cache and must perform real egress.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 6a2f3029-397d-4b90-b743-5170428240a9

📥 Commits

Reviewing files that changed from the base of the PR and between 89988a5 and b20c809.

📒 Files selected for processing (1)
  • test/e2e/e2e-cloud-experimental/skip/05-network-policy.sh

Comment on lines +185 to +186
if ! python -m pip download --no-deps --disable-pip-version-check -d "$PROBE_DL" --timeout 90 idna==3.7; then
echo "pypi whitelist: pip download idna==3.7 from PyPI failed (egress / proxy / policy)"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

wc -l test/e2e/e2e-cloud-experimental/skip/05-network-policy.sh

Repository: NVIDIA/NemoClaw

Length of output: 120


🏁 Script executed:

cat -n test/e2e/e2e-cloud-experimental/skip/05-network-policy.sh | sed -n '180,190p'

Repository: NVIDIA/NemoClaw

Length of output: 518


🏁 Script executed:

cat -n test/e2e/e2e-cloud-experimental/skip/05-network-policy.sh | sed -n '275,290p'

Repository: NVIDIA/NemoClaw

Length of output: 631


🏁 Script executed:

cat -n test/e2e/e2e-cloud-experimental/skip/05-network-policy.sh | sed -n '300,330p'

Repository: NVIDIA/NemoClaw

Length of output: 1456


🏁 Script executed:

# Check if there are any environment variables or flags that might already disable caching
rg -i "no-cache|cache|HF_HOME|HF_HUB_CACHE" test/e2e/e2e-cloud-experimental/skip/05-network-policy.sh

Repository: NVIDIA/NemoClaw

Length of output: 41


🌐 Web query:

pip download default cache behavior --no-cache-dir

💡 Result:

By default, pip download uses pip's cache (enabled since pip 6.0) to store HTTP responses and locally built wheels, avoiding redundant downloads and builds. The cache is located at ~/.cache/pip (Linux, respects XDG_CACHE_HOME), ~/Library/Caches/pip (macOS), or %LocalAppData%\pip\Cache (Windows). Use pip cache dir to find the exact path. The --no-cache-dir option disables this caching entirely, forcing fresh downloads every time. This is useful in scenarios like Docker builds to reduce image size or ensure fresh packages, but it's not recommended generally as it slows down operations by repeating downloads and builds.

Citations:


🌐 Web query:

Hugging Face HF_HOME environment variable cache location

💡 Result:

The HF_HOME environment variable specifies the base directory where huggingface_hub stores local data, including tokens and various caches. It defaults to ~/.cache/huggingface (or $XDG_CACHE_HOME/huggingface if XDG_CACHE_HOME is set and HF_HOME is not). Key cache locations under HF_HOME: - Hub cache (models, datasets, spaces): $HF_HOME/hub (default: ~/.cache/huggingface/hub) - Token: $HF_HOME/token - Xet cache: $HF_HOME/xet - Assets cache: $HF_HOME/assets To use it, set the environment variable before importing huggingface_hub, e.g.: export HF_HOME=/path/to/custom/dir or in Python: import os os.environ['HF_HOME'] = '/path/to/custom/dir' Note: HF_HUB_CACHE can override just the hub cache specifically, defaulting to $HF_HOME/hub.

Citations:


Disable caches for PyPI and Hugging Face probes to ensure egress testing is reliable.

The pip download command reuses the default pip cache at ~/.cache/pip, and HF tools reuse the cache at ~/.cache/huggingface. Since this script runs late in the sandbox, previous operations may have warmed these caches, allowing downloads to succeed without actual egress. Add --no-cache-dir to line 185, and set HF_HOME to a fresh temporary directory before the HF download commands (lines 305–324) to force fresh network requests.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/e2e-cloud-experimental/skip/05-network-policy.sh` around lines 185 -
186, The pip probe currently can hit a warmed cache; modify the python -m pip
download invocation (the command that downloads idna==3.7) to add the
--no-cache-dir flag so it cannot use ~/.cache/pip, and for the Hugging Face
probe block (the HF download commands) set HF_HOME to a fresh temporary
directory (e.g., create and export a unique temp dir via mktemp -d) immediately
before those HF download commands and clean it up afterward so HF tools use an
empty cache and must perform real egress.

@cv cv merged commit 5ac8313 into NVIDIA:main Mar 30, 2026
8 checks passed
@wscurran wscurran added the CI/CD Use this label to identify issues with NemoClaw CI/CD pipeline or GitHub Actions. label Mar 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI/CD Use this label to identify issues with NemoClaw CI/CD pipeline or GitHub Actions.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants