Skip to content

fix(security): strip credentials from migration snapshots and enforce blueprint digest#743

Merged
cv merged 10 commits intomainfrom
fix/strip-credentials-migration-snapshots
Mar 25, 2026
Merged

fix(security): strip credentials from migration snapshots and enforce blueprint digest#743
cv merged 10 commits intomainfrom
fix/strip-credentials-migration-snapshots

Conversation

@ericksoa
Copy link
Copy Markdown
Contributor

@ericksoa ericksoa commented Mar 23, 2026

Summary

  • Filter auth-profiles.json from snapshot bundles during createSnapshotBundle() to prevent credential references from crossing the sandbox boundary
  • Strip gateway config (contains auth tokens) from sandbox openclaw.json in prepareSandboxState()
  • Add blueprint digest verification: SHA-256 recorded at snapshot time, validated on restore. Empty/missing digest is a hard failure; old snapshots without the field skip validation for backward compatibility
  • Bump SNAPSHOT_VERSION 2 → 3

Test plan

  • 50 unit tests pass in migration-state.test.ts (41 existing + 9 new)
  • Full suite: 312 tests across 27 files, zero regressions
  • Manual: nemoclaw onboard + nemoclaw migrate e2e with Docker

Closes #156

Summary by CodeRabbit

  • Improvements

    • Snapshot format bumped to v3 and can record an optional blueprint digest; snapshot creation accepts an optional blueprint and records its SHA-256 digest when provided.
    • Snapshot creation now excludes credential-sensitive files and strips gateway data from sandbox state.
    • Restoration enforces digest checks: requires non-empty exact match when a digest is present, rejects empty/mismatched digests or missing blueprints, and remains backward-compatible when no digest exists.
  • Tests

    • Added tests covering snapshot creation, exclusion/stripping, digest recording, and restore validation.
    • Reformatted existing tests for readability.

@ericksoa ericksoa self-assigned this Mar 23, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 23, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d25ac44d-7dd6-41b4-a4c2-82cb6d339ed1

📥 Commits

Reviewing files that changed from the base of the PR and between 7c0789d and 3137546.

📒 Files selected for processing (2)
  • nemoclaw/src/commands/migration-state.test.ts
  • nemoclaw/src/commands/migration-state.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • nemoclaw/src/commands/migration-state.ts

📝 Walkthrough

Walkthrough

Snapshot format bumped to version 3. Snapshots now strip host credential files (auth-profiles.json) from host and external-root copies, remove the gateway key from sandbox openclaw.json, and optionally record a SHA‑256 blueprintDigest; restores for v3 enforce non-empty matching digest when present.

Changes

Cohort / File(s) Summary
Migration logic & manifest
nemoclaw/src/commands/migration-state.ts
Bumped snapshot version → 3. Added `SnapshotManifest.blueprintDigest?: string
Credential filtering & sandbox sanitization
nemoclaw/src/commands/migration-state.ts
Copying gained filtering (filter/stripCredentials) to omit sensitive files (case-insensitive auth-profiles.json) from host and external-root snapshots. Sandbox preparation copies openclaw with credential stripping, deletes gateway from loaded config before writing openclaw.json.
Tests & fs mock updates
nemoclaw/src/commands/migration-state.test.ts
Extended node:fs mock cpSync to accept opts.filter(sourcePath => boolean) and skip filtered entries. Updated tests to expect version: 3. Added tests for omission of auth-profiles.json, removal of gateway in sandbox openclaw.json, and blueprint‑digest scenarios (present/non‑empty, empty, missing, mismatch) covering create/restore flows.
Minor test formatting
test/policies.test.js
Formatting-only adjustments: several expect(...).toBe(...) and array expressions reformatted to multi-line/parenthesized forms; no behavioral changes.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant MS as createSnapshotBundle()
    participant FS as FileSystem
    participant BLP as BlueprintFile
    participant MF as snapshot.json

    User->>MS: call(hostState, logger, { persist, blueprintPath? })
    MS->>FS: copyDirectory(hostState.stateDir, dest, { filter: omit auth-profiles.json })
    FS-->>MS: filtered files copied
    MS->>MS: sanitize sandbox openclaw.json (remove gateway, redact creds)
    alt blueprintPath provided
        MS->>BLP: read(blueprintPath)
        BLP-->>MS: contents
        MS->>MS: compute SHA-256 digest
        MS->>MF: write { version: 3, blueprintDigest: "sha256:..." }
    else no blueprintPath
        MS->>MF: write { version: 3, blueprintDigest: null }
    end
    MS-->>User: snapshot bundle created
Loading
sequenceDiagram
    actor User
    participant RS as restoreSnapshotToHost()
    participant MF as snapshot.json
    participant BLP as BlueprintFile
    participant FS as FileSystem

    User->>RS: call(snapshotDir, logger, { blueprintPath? })
    RS->>MF: read snapshot.json (version, blueprintDigest?)
    alt manifest.version >= 3 and blueprintDigest present (non-empty)
        RS->>BLP: read(blueprintPath)
        BLP-->>RS: contents / unreadable
        RS->>RS: compute SHA-256 digest
        alt computed == manifest.blueprintDigest
            RS->>FS: restore files to host
            FS-->>RS: restored
            RS-->>User: success
        else mismatch or unreadable
            RS-->>User: fail with "digest mismatch"
        end
    else blueprintDigest absent (back-compat) or version < 3
        RS->>FS: restore files to host
        FS-->>RS: restored
        RS-->>User: success
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I hop through bytes and quietly sweep,

I hide the keys where secrets sleep,
I hash the map and mark its face,
No empty echoes win this race,
Version three — snug in a safe green heap.

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Out of Scope Changes check ⚠️ Warning The changes in migration-state.ts and migration-state.test.ts directly implement the linked issue objectives. However, test/policies.test.js reformatting appears unrelated to the security fixes. Remove unrelated formatting changes from test/policies.test.js or explain their necessity; they should be in a separate formatting PR.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly describes the main security fix: stripping credentials from migration snapshots and enforcing blueprint digest verification.
Linked Issues check ✅ Passed The code changes fully implement both objectives from issue #156: credential file filtering and gateway key removal address vulnerability 1, and blueprint digest verification addresses vulnerability 2.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/strip-credentials-migration-snapshots

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@nemoclaw/src/commands/migration-state.test.ts`:
- Around line 564-576: Remove all non-null assertions (e.g., bundle!,
sandboxConfigEntry!) and replace them with explicit null guards: after calling
createSnapshotBundle (or accessing sandboxConfigEntry), check if the variable
=== null and if so call expect.unreachable with a helpful message and return;
otherwise proceed using the variable normally. Apply this pattern around uses of
bundle (including where snapshotKeys and authProfileKeys are derived) and the
other occurrences mentioned so lint no longer flags forbidden non-null
assertions.

In `@nemoclaw/src/commands/migration-state.ts`:
- Around line 491-500: The externalRoots copies are missing the
credential-stripping option, so update every call that archives or copies
hostState.externalRoots (and any other external root copies) to pass the same
options object { stripCredentials: true } when bundling for the sandbox; locate
the copyDirectory function and find all usages that copy hostState.externalRoots
(and the copy that handles hostState.stateDir) and change those calls to forward
the stripCredentials flag (e.g., copyDirectory(externalRootPath, destPath, {
stripCredentials: options.stripCredentials ?? true })) so auth-profiles.json and
other sensitive basenames are filtered consistently.
- Around line 479-482: computeFileDigest should catch filesystem errors and
return null instead of letting readFileSync throw; then update
createSnapshotBundle so that for v3 snapshots it does not serialize
blueprintDigest: null — if computeFileDigest returns null for a v3 snapshot
(missing/unreadable blueprintPath) throw an error instead of writing null into
the manifest; finally change restoreSnapshotToHost's verification gate to skip
verification only when blueprintDigest is entirely absent (undefined) rather
than when != null so that null in a v3 manifest triggers verification/failure;
reference computeFileDigest, createSnapshotBundle, restoreSnapshotToHost,
blueprintDigest and blueprintPath when making these changes.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 68223c18-5eb0-4248-a54e-ca33fb745b9f

📥 Commits

Reviewing files that changed from the base of the PR and between e1097a6 and 9d601f0.

📒 Files selected for processing (2)
  • nemoclaw/src/commands/migration-state.test.ts
  • nemoclaw/src/commands/migration-state.ts

@ericksoa ericksoa changed the title security: strip credentials from migration snapshots and enforce blueprint digest fix(security): strip credentials from migration snapshots and enforce blueprint digest Mar 23, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
nemoclaw/src/commands/migration-state.ts (1)

635-649: ⚠️ Potential issue | 🔴 Critical

Still leaves a v3 blueprint-digest bypass.

Line 635-637 can still assign blueprintDigest: null, and Line 777 still gates verification on manifest.blueprintDigest != null. That preserves the bypass for v3 manifests instead of reserving the skip path for legacy snapshots where the field is absent. Line 782-784 also computes the digest before the restore try, so a filesystem failure there still escapes restoreSnapshotToHost() instead of returning false.

Also applies to: 776-799

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@nemoclaw/src/commands/migration-state.ts`:
- Around line 491-500: The filter in copyDirectory currently checks
CREDENTIAL_SENSITIVE_BASENAMES.has(path.basename(source)) case-sensitively, so
on case-insensitive filesystems variants like "Auth-Profiles.json" slip through;
change the lookup to use a normalized basename (e.g., const name =
path.basename(source).toLowerCase()) and check
CREDENTIAL_SENSITIVE_BASENAMES.has(name) so basenames are matched
case-insensitively when options?.stripCredentials is true.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f613e7b1-a83e-460a-84e7-b18bd9d89f0f

📥 Commits

Reviewing files that changed from the base of the PR and between 9d601f0 and cbcb834.

📒 Files selected for processing (3)
  • nemoclaw/src/commands/migration-state.test.ts
  • nemoclaw/src/commands/migration-state.ts
  • test/policies.test.js
✅ Files skipped from review due to trivial changes (2)
  • test/policies.test.js
  • nemoclaw/src/commands/migration-state.test.ts

gn00295120 added a commit to gn00295120/NemoClaw that referenced this pull request Mar 24, 2026
… blueprint digest

Reconciles NVIDIA#156 and NVIDIA#743 into a single comprehensive solution:

- Filter auth-profiles.json at copy time via cpSync filter (from NVIDIA#743)
- Recursive stripCredentials() with pattern-based field detection for
  deep config sanitization (from NVIDIA#156: CREDENTIAL_FIELDS set +
  CREDENTIAL_FIELD_PATTERN regex)
- Remove gateway config section (contains auth tokens) from sandbox
  openclaw.json
- Blueprint digest verification (SHA-256): recorded at snapshot time,
  validated on restore, empty/missing digest is a hard failure
- Backward compatible: old snapshots without blueprintDigest skip
  validation
- Bump SNAPSHOT_VERSION 2 → 3

Attack chain (now broken at multiple points):

  Telegram message → shell injection → read auth-profiles.json
                                       ↑ BLOCKED: filtered at copy
  Telegram message → shell injection → read openclaw.json credentials
                                       ↑ BLOCKED: stripped by pattern
  Blueprint tampering → digest: "" → verification passes
                                     ↑ BLOCKED: hard failure

Supersedes NVIDIA#156 and NVIDIA#743.
@ericksoa ericksoa added the security Something isn't secure label Mar 24, 2026
… blueprint digest

- Filter auth-profiles.json from snapshot bundles during createSnapshotBundle()
- Strip gateway config (contains auth tokens) from sandbox openclaw.json
- Add blueprint digest verification: SHA-256 recorded at snapshot time, validated on restore
- Bump SNAPSHOT_VERSION 2 → 3

Closes #156
@ericksoa ericksoa force-pushed the fix/strip-credentials-migration-snapshots branch from b6a3dd7 to 4f8d9a4 Compare March 25, 2026 18:12
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
nemoclaw/src/commands/migration-state.ts (2)

498-500: ⚠️ Potential issue | 🟠 Major

Normalize credential basename before set lookup.

Line 499 uses a case-sensitive basename check. On case-insensitive filesystems, case-variant auth-profiles.json filenames can slip through.

Suggested fix
     filter: options?.stripCredentials
-      ? (source: string) => !CREDENTIAL_SENSITIVE_BASENAMES.has(path.basename(source))
+      ? (source: string) =>
+          !CREDENTIAL_SENSITIVE_BASENAMES.has(path.basename(source).toLowerCase())
       : undefined,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nemoclaw/src/commands/migration-state.ts` around lines 498 - 500, The filter
predicate uses a case-sensitive check against CREDENTIAL_SENSITIVE_BASENAMES via
path.basename(source), which misses case variants on case-insensitive
filesystems; change the lookup to normalize the basename (e.g., call
.toLowerCase() on path.basename(source)) or ensure
CREDENTIAL_SENSITIVE_BASENAMES contains lowercased entries and compare
normalized values so the predicate reliably rejects credential files regardless
of case (update the filter expression where options?.stripCredentials,
CREDENTIAL_SENSITIVE_BASENAMES and path.basename(source) are used).

479-482: ⚠️ Potential issue | 🔴 Critical

Fail closed for v3 blueprint integrity; null digest still bypasses verification and digest reads can throw.

Line 635 still serializes blueprintDigest: null when blueprintPath is absent, and Line 775 only verifies when manifest.blueprintDigest != null, so new v3 snapshots can still skip integrity checks. Also, Line 481 can throw, and that path executes before the restore try/catch block.

Suggested fix
 function computeFileDigest(filePath: string): string | null {
-  if (!existsSync(filePath)) return null;
-  return createHash("sha256").update(readFileSync(filePath)).digest("hex");
+  try {
+    if (!existsSync(filePath)) return null;
+    const stat = lstatSync(filePath);
+    if (!stat.isFile()) return null;
+    return createHash("sha256").update(readFileSync(filePath)).digest("hex");
+  } catch {
+    return null;
+  }
 }
@@
-    const blueprintDigest = options.blueprintPath ? computeFileDigest(options.blueprintPath) : null;
+    const blueprintDigest = options.blueprintPath ? computeFileDigest(options.blueprintPath) : null;
+    if (!blueprintDigest) {
+      throw new Error("Blueprint digest is required for snapshot version 3.");
+    }
@@
-      blueprintDigest,
+      blueprintDigest,
@@
-  if (manifest.blueprintDigest != null) {
-    if (!manifest.blueprintDigest || typeof manifest.blueprintDigest !== "string") {
+  if ("blueprintDigest" in manifest) {
+    if (typeof manifest.blueprintDigest !== "string" || !manifest.blueprintDigest.trim()) {
       logger.error("Snapshot manifest has invalid blueprintDigest. Refusing to restore.");
       return false;
     }
#!/bin/bash
# Verify current fail-open/null bypass and pre-try digest call patterns.
rg -n -C2 'const blueprintDigest = options\.blueprintPath \? computeFileDigest\(options\.blueprintPath\) : null;|if \(manifest\.blueprintDigest != null\)|function computeFileDigest|const currentDigest = options\?\.blueprintPath \? computeFileDigest\(options\.blueprintPath\) : null;'

Also applies to: 635-647, 774-795

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nemoclaw/src/commands/migration-state.ts` around lines 479 - 482,
computeFileDigest currently returns null and is called before the restore
try/catch, allowing a null/failed digest to bypass verification; update
computeFileDigest to throw on missing/unreadable files (never return null) and
move any calls that compute blueprintDigest (the const blueprintDigest /
currentDigest assignments) into the restore try/catch block so exceptions are
caught during restore; when serializing snapshots do not emit blueprintDigest:
null (omit the field) and change the verification logic that uses
manifest.blueprintDigest to treat absence/null as a verification failure for v3
snapshots (i.e., always perform integrity check and fail if digest is missing or
mismatched). Ensure you reference and update computeFileDigest, the code that
assigns blueprintDigest/currentDigest, and the check that reads
manifest.blueprintDigest so the flow is fail-closed and digest reads are inside
the try/catch.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@nemoclaw/src/commands/migration-state.ts`:
- Around line 498-500: The filter predicate uses a case-sensitive check against
CREDENTIAL_SENSITIVE_BASENAMES via path.basename(source), which misses case
variants on case-insensitive filesystems; change the lookup to normalize the
basename (e.g., call .toLowerCase() on path.basename(source)) or ensure
CREDENTIAL_SENSITIVE_BASENAMES contains lowercased entries and compare
normalized values so the predicate reliably rejects credential files regardless
of case (update the filter expression where options?.stripCredentials,
CREDENTIAL_SENSITIVE_BASENAMES and path.basename(source) are used).
- Around line 479-482: computeFileDigest currently returns null and is called
before the restore try/catch, allowing a null/failed digest to bypass
verification; update computeFileDigest to throw on missing/unreadable files
(never return null) and move any calls that compute blueprintDigest (the const
blueprintDigest / currentDigest assignments) into the restore try/catch block so
exceptions are caught during restore; when serializing snapshots do not emit
blueprintDigest: null (omit the field) and change the verification logic that
uses manifest.blueprintDigest to treat absence/null as a verification failure
for v3 snapshots (i.e., always perform integrity check and fail if digest is
missing or mismatched). Ensure you reference and update computeFileDigest, the
code that assigns blueprintDigest/currentDigest, and the check that reads
manifest.blueprintDigest so the flow is fail-closed and digest reads are inside
the try/catch.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 97ecb87f-2ff9-4fc4-a020-e83ac8780974

📥 Commits

Reviewing files that changed from the base of the PR and between a9fcd22 and 4f8d9a4.

📒 Files selected for processing (3)
  • nemoclaw/src/commands/migration-state.test.ts
  • nemoclaw/src/commands/migration-state.ts
  • test/policies.test.js
✅ Files skipped from review due to trivial changes (1)
  • test/policies.test.js
🚧 Files skipped from review as they are similar to previous changes (1)
  • nemoclaw/src/commands/migration-state.test.ts

…ntial filter

- computeFileDigest: catch FS errors instead of letting readFileSync throw
- createSnapshotBundle: omit blueprintDigest from manifest when no blueprintPath;
  throw if blueprintPath provided but file is missing/unreadable
- restoreSnapshotToHost: gate verification on manifest.version >= 3 instead of
  != null so tampered/null digests in v3 manifests fail closed
- copyDirectory: lowercase basename before CREDENTIAL_SENSITIVE_BASENAMES lookup
  to prevent case-variant bypasses on case-insensitive filesystems
@ericksoa ericksoa force-pushed the fix/strip-credentials-migration-snapshots branch from b7d2289 to 48f8833 Compare March 25, 2026 18:54
Copy link
Copy Markdown
Contributor

@cv cv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work on the credential stripping and digest enforcement. Three suggestions below.

}

// Strip gateway config (contains auth tokens) — sandbox entrypoint regenerates it
delete (config as Record<string, unknown>)["gateway"];
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defense in depth — prepareSandboxState copies without credential filter

Line 577 (copyDirectory(path.join(snapshotDir, "openclaw"), preparedStateDir)) copies from the snapshot into the sandbox bundle without { stripCredentials: true }. The snapshot itself was already filtered, so auth-profiles.json shouldn't be there in the normal flow. But if this function is ever called on unfiltered input, or a pre-existing snapshot predates the filter, credentials could leak into the sandbox bundle. Adding stripCredentials here is cheap insurance:

copyDirectory(path.join(snapshotDir, "openclaw"), preparedStateDir, { stripCredentials: true });

(Can't put a suggestion block here since line 577 isn't in the diff — this is an existing unchanged line in prepareSandboxState.)

Copy link
Copy Markdown
Contributor

@cv cv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Credential stripping and digest enforcement look solid overall. Requesting changes for one bug:

v3 snapshots without blueprintPath are silently un-restorablecreateSnapshotBundle creates a v3 snapshot without blueprintDigest when no blueprintPath is given, but restoreSnapshotToHost unconditionally rejects v3 manifests with missing digest. Creation succeeds, restore always fails. See my inline suggestions for the fix + missing test.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
nemoclaw/src/commands/migration-state.ts (1)

615-659: ⚠️ Potential issue | 🟠 Major

Clean up partial bundles when snapshot creation aborts.

The new digest check runs after Lines 618-632 have already copied the state, external config, and external roots. If computeFileDigest() returns null, this returns null but leaves a partially populated bundle under parentDir. That strands temp state on disk and can retain config/openclaw.json even though snapshot creation failed.

🧹 Minimal fix
   } catch (err: unknown) {
+    rmSync(parentDir, { recursive: true, force: true });
     const msg = err instanceof Error ? err.message : String(err);
     logger.error(`Snapshot failed: ${msg}`);
     return null;
   }

Also applies to: 671-674

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nemoclaw/src/commands/migration-state.ts` around lines 615 - 659, The
snapshot creation copies state and config before validating the optional
blueprint digest, which can leave a partial bundle under parentDir if
computeFileDigest(options.blueprintPath) returns null; fix by validating the
blueprint digest before making any disk changes (call
computeFileDigest(options.blueprintPath) and throw if invalid prior to
mkdirSync/copyDirectory), or if you prefer minimal change, wrap the
copy/manifest-building logic in a try/catch and on any error remove the created
parentDir (rmdirRecursive or fs.rm(parentDir, { recursive: true, force: true }))
to clean up; apply the same change to the other digest-check block referenced
around 671-674 so partial bundles are always removed on failure.
🧹 Nitpick comments (1)
nemoclaw/src/commands/migration-state.test.ts (1)

534-578: Cover the external-root stripping path too.

This only proves filtering under hostState.stateDir. The new external-root copy in nemoclaw/src/commands/migration-state.ts at Line 632—and the case-normalized basename check at Line 503—can still regress without failing this suite. Please add one external root fixture containing Auth-Profiles.json and assert that it is excluded as well.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nemoclaw/src/commands/migration-state.test.ts` around lines 534 - 578, Add an
external root containing an auth-profiles file and assert it is excluded from
the snapshot: update the test that calls createSnapshotBundle (using
HostOpenClawState) to set hostState.externalRoots to include a directory
(created via addDir/addFile) that contains "Auth-Profiles.json" (to cover
case-normalization) and then assert no snapshot keys under bundle.snapshotDir
end with "auth-profiles.json"; keep the existing check that config.json in
agents/main/agent is still present.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@nemoclaw/src/commands/migration-state.ts`:
- Around line 615-659: The snapshot creation copies state and config before
validating the optional blueprint digest, which can leave a partial bundle under
parentDir if computeFileDigest(options.blueprintPath) returns null; fix by
validating the blueprint digest before making any disk changes (call
computeFileDigest(options.blueprintPath) and throw if invalid prior to
mkdirSync/copyDirectory), or if you prefer minimal change, wrap the
copy/manifest-building logic in a try/catch and on any error remove the created
parentDir (rmdirRecursive or fs.rm(parentDir, { recursive: true, force: true }))
to clean up; apply the same change to the other digest-check block referenced
around 671-674 so partial bundles are always removed on failure.

---

Nitpick comments:
In `@nemoclaw/src/commands/migration-state.test.ts`:
- Around line 534-578: Add an external root containing an auth-profiles file and
assert it is excluded from the snapshot: update the test that calls
createSnapshotBundle (using HostOpenClawState) to set hostState.externalRoots to
include a directory (created via addDir/addFile) that contains
"Auth-Profiles.json" (to cover case-normalization) and then assert no snapshot
keys under bundle.snapshotDir end with "auth-profiles.json"; keep the existing
check that config.json in agents/main/agent is still present.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b786d817-83bb-4da8-95e8-f8aa19ca45ea

📥 Commits

Reviewing files that changed from the base of the PR and between 48f8833 and 547a869.

📒 Files selected for processing (2)
  • nemoclaw/src/commands/migration-state.test.ts
  • nemoclaw/src/commands/migration-state.ts

@ericksoa ericksoa force-pushed the fix/strip-credentials-migration-snapshots branch from 7c0789d to 3137546 Compare March 25, 2026 20:06
@ericksoa ericksoa requested a review from cv March 25, 2026 20:34
@cv cv merged commit 34296b8 into main Mar 25, 2026
8 checks passed
@cv cv deleted the fix/strip-credentials-migration-snapshots branch March 25, 2026 20:59
gn00295120 added a commit to gn00295120/NemoClaw that referenced this pull request Mar 26, 2026
… blueprint digest

Reconciles NVIDIA#156 and NVIDIA#743 into a single comprehensive solution:

- Filter auth-profiles.json at copy time via cpSync filter (from NVIDIA#743)
- Recursive stripCredentials() with pattern-based field detection for
  deep config sanitization (from NVIDIA#156: CREDENTIAL_FIELDS set +
  CREDENTIAL_FIELD_PATTERN regex)
- Remove gateway config section (contains auth tokens) from sandbox
  openclaw.json
- Blueprint digest verification (SHA-256): recorded at snapshot time,
  validated on restore, empty/missing digest is a hard failure
- computeFileDigest() throws when blueprint file is missing instead of
  silently returning null
- Sanitize both snapshot-level and sandbox-bundle openclaw.json copies
- Backward compatible: old snapshots without blueprintDigest skip
  validation
- Bump SNAPSHOT_VERSION 2 → 3

Supersedes NVIDIA#156 and NVIDIA#743.
gn00295120 added a commit to gn00295120/NemoClaw that referenced this pull request Mar 26, 2026
… blueprint digest

Reconciles NVIDIA#156 and NVIDIA#743 into a single comprehensive solution:

- Filter auth-profiles.json at copy time via cpSync filter (from NVIDIA#743)
- Recursive stripCredentials() with pattern-based field detection for
  deep config sanitization (from NVIDIA#156: CREDENTIAL_FIELDS set +
  CREDENTIAL_FIELD_PATTERN regex)
- Remove gateway config section (contains auth tokens) from sandbox
  openclaw.json
- Blueprint digest verification (SHA-256): recorded at snapshot time,
  validated on restore, empty/missing digest is a hard failure
- computeFileDigest() throws when blueprint file is missing instead of
  silently returning null
- Sanitize both snapshot-level and sandbox-bundle openclaw.json copies
- Backward compatible: old snapshots without blueprintDigest skip
  validation
- Bump SNAPSHOT_VERSION 2 → 3

Supersedes NVIDIA#156 and NVIDIA#743.
cv pushed a commit that referenced this pull request Mar 27, 2026
… blueprint digest (#769)

Reconciles #156 and #743 into a single comprehensive solution:

- Filter auth-profiles.json at copy time via cpSync filter (from #743)
- Recursive stripCredentials() with pattern-based field detection for
  deep config sanitization (from #156: CREDENTIAL_FIELDS set +
  CREDENTIAL_FIELD_PATTERN regex)
- Remove gateway config section (contains auth tokens) from sandbox
  openclaw.json
- Blueprint digest verification (SHA-256): recorded at snapshot time,
  validated on restore, empty/missing digest is a hard failure
- computeFileDigest() throws when blueprint file is missing instead of
  silently returning null
- Sanitize both snapshot-level and sandbox-bundle openclaw.json copies
- Backward compatible: old snapshots without blueprintDigest skip
  validation
- Bump SNAPSHOT_VERSION 2 → 3

Supersedes #156 and #743.
TSavo pushed a commit to wopr-network/nemoclaw that referenced this pull request Mar 28, 2026
* fix: improve gateway lifecycle recovery (NVIDIA#953)

* fix: improve gateway lifecycle recovery

* docs: fix readme markdown list spacing

* fix: tighten gateway lifecycle review follow-ups

* fix: simplify tokenized control ui output

* fix: restore chat route in control ui urls

* refactor: simplify ansi stripping in onboard

* fix: shorten control ui url output

* fix: move control ui below cli next steps

* fix: swap hard/soft ulimit settings in start script (NVIDIA#951)

Fixes NVIDIA#949

Co-authored-by: KJ <kejones@nvidia.com>

* chore: add cyclomatic complexity lint rule (NVIDIA#875)

* chore: add cyclomatic complexity rule (ratchet from 95)

Add ESLint complexity rule to bin/ and scripts/ to prevent new
functions from accumulating excessive branching. Starting threshold
is 95 (current worst offender: setupNim in onboard.js). Ratchet
plan: 95 → 40 → 25 → 15.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: ratchet complexity to 20, suppress existing violations

Suppress 6 functions that exceed the threshold with eslint-disable
comments so we can start enforcing at 20 instead of 95:

- setupNim (95), setupPolicies (41), setupInference (22) in onboard.js
- deploy (22), main IIFE (27) in nemoclaw.js
- applyPreset (24) in policies.js

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: suppress complexity for 3 missed functions

preflight (23), getReconciledSandboxGatewayState (25), sandboxStatus (27)

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add host-side config and state file locations to README (NVIDIA#903)

Signed-off-by: peteryuqin <peter.yuqin@gmail.com>

* chore: add tsconfig.cli.json, root execa, TS coverage ratchet (NVIDIA#913)

* chore: add tsconfig.cli.json, root execa, TS coverage ratchet

Foundation for the CLI TypeScript migration (PR 0 of the shell
consolidation plan). No runtime changes — config, tooling, and
dependency only.

- tsconfig.cli.json: strict TS type-checking for bin/ and scripts/
  (noEmit, module: preserve — tsx handles the runtime)
- scripts/check-coverage-ratchet.ts: pure TS replacement for the
  bash+python coverage ratchet script (same logic, same tolerance)
- execa ^9.6.1 added to root devDependencies (used by PR 1+)
- pr.yaml: coverage ratchet step now runs the TS version via tsx
- .pre-commit-config.yaml: SPDX headers cover scripts/*.ts,
  new tsc-check-cli pre-push hook
- CONTRIBUTING.md: document typecheck:cli task and CLI pre-push hook
- Delete scripts/check-coverage-ratchet.sh

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Apply suggestion from @brandonpelfrey

* chore: address PR feedback — use types_or, add tsx devDep

- Use `types_or: [ts, tsx]` instead of file glob for tsc-check-cli
  hook per @brandonpelfrey's suggestion.
- Add `tsx` to devDependencies so CI doesn't re-fetch it on every run
  per CodeRabbit's suggestion.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(ci): ignore GitHub "Apply suggestion" commits in commitlint

* fix(ci): lint only PR title since repo is squash-merge only

Reverts the commitlint ignores rule from the previous commit and
instead removes the per-commit lint step entirely.

Individual commit messages are discarded at merge time — only the
squash-merged PR title lands in main and drives changelog generation.
Drop the per-commit lint, keep the PR title check, and remove the
now-unnecessary fetch-depth: 0.

* Revert "fix(ci): lint only PR title since repo is squash-merge only"

This reverts commit 1257a47.

* Revert "fix(ci): ignore GitHub "Apply suggestion" commits in commitlint"

This reverts commit c395657.

* docs: fix markdownlint MD032 in README (blank line before list)

* refactor: make coverage ratchet script idiomatic TypeScript

- Wrap in main() with process.exitCode instead of scattered process.exit()
- Replace mutable flags with .map()/.some() over typed MetricResult[]
- Separate pure logic (checkMetrics) from formatting (formatReport)
- Throw with { cause } chaining instead of exit-in-helpers
- Derive CoverageThresholds from METRICS tuple (single source of truth)
- Exhaustive switch on CheckStatus discriminated union

* refactor: remove duplication in coverage ratchet script

- Drop STATUS_LABELS map; inline labels in exhaustive switch
- Extract common 'metric coverage is N%' preamble in formatResult
- Simplify ratchetedThresholds: use results directly (already in
  METRICS order) instead of re-scanning with .find() per metric
- Compute 'failed' once in main, pass into formatReport to avoid
  duplicate .some() scan

* refactor: simplify coverage ratchet with FP patterns

- Extract classify() as a named pure function (replaces nested ternary)
- loadJSON takes repo-relative paths, eliminating THRESHOLD_PATH and
  SUMMARY_PATH constants (DRY the join-with-REPO_ROOT pattern)
- Drop CoverageMetric/CoverageSummary interfaces (only pct is read);
  use structural type at the call site instead
- Inline ratchetedThresholds (one-liner, used once)
- formatReport derives fail/improved from results instead of taking
  a pre-computed boolean (let functions derive from data, don't
  thread derived state)
- sections.join("\n\n") replaces manual empty-string pushing
- Shorter type names (Thresholds, Status, Result) — no ambiguity
  in a single-purpose script

* refactor: strip coverage ratchet to failure-only output

prek hides output from commands that exit 0, so ok/improved
reporting was dead code. Remove Status, Result, classify,
formatResult, formatReport, and the ratcheted-thresholds
suggestion block. The script now just filters for regressions
and prints actionable errors on failure.

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Brandon Pelfrey <bpelfrey@nvidia.com>

* fix: use CONNECT tunnel for WebSocket endpoints in Discord/Slack presets (NVIDIA#438)

* fix: use CONNECT tunnel for WebSocket endpoints in Discord/Slack presets

The egress proxy's HTTP idle timeout (~2 min) kills long-lived WebSocket
connections when endpoints are configured with protocol:rest + tls:terminate.
Switch WebSocket endpoints to access:full (CONNECT tunnel) which bypasses
HTTP-level timeouts entirely.

Discord:
- gateway.discord.gg → access:full (WebSocket gateway)
- Add PUT/PATCH/DELETE methods for discord.com (message editing, reactions)
- Add media.discordapp.net for attachment access

Slack:
- Add wss-primary.slack.com and wss-backup.slack.com → access:full
  (Socket Mode WebSocket endpoints)

Partially addresses NVIDIA#409 — the policy-level fix enables WebSocket
connections to survive. The hardcoded 2-min timeout in openshell-sandbox
still affects any protocol:rest endpoints with long-lived connections.

Related: NVIDIA#361 (WhatsApp Web, same root cause)

* fix: correct comment wording for media endpoint and YAML formatting

* fix: standardize Node.js minimum version to 22.16 (NVIDIA#840)

* fix: remove unused RECOMMENDED_NODE_MAJOR from scripts/install.sh

Shellcheck flagged it as unused after the min/recommended merge.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: enforce full semver >=22.16.0 in installer scripts

The runtime checks only compared the major Node.js version, allowing
22.0–22.15 to pass despite package.json requiring >=22.16.0. Use the
version_gte() helper for full semver comparison in both installers.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: harden version_gte and align fallback message

Guard version_gte() against prerelease suffixes (e.g. "22.16.0-rc.1")
that would crash bash arithmetic. Also update the manual-install
fallback message to reference MIN_NODE_VERSION instead of hardcoded "22".

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: update test stubs for Node.js 22.16 minimum and add Node 20 rejection test

- Bump node stub in 'succeeds with acceptable Node.js' from v20.0.0 to v22.16.0
- Bump node stub in buildCurlPipeEnv from v22.14.0 to v22.16.0
- Add new test asserting Node.js 20 is rejected by ensure_supported_runtime

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: harden installer and onboard resiliency (NVIDIA#961)

* fix: harden installer and onboard resiliency

* fix: address installer and debug review follow-ups

* fix: harden onboard resume across later setup steps

* test: simplify payload extraction in onboard tests

* test: keep onboard payload extraction target-compatible

* chore: align onboard session lint with complexity rule

* fix: harden onboard session safety and lock handling

* fix: tighten onboard session redaction and metadata handling

* fix(security): strip credentials from migration snapshots and enforce blueprint digest (NVIDIA#769)

Reconciles NVIDIA#156 and NVIDIA#743 into a single comprehensive solution:

- Filter auth-profiles.json at copy time via cpSync filter (from NVIDIA#743)
- Recursive stripCredentials() with pattern-based field detection for
  deep config sanitization (from NVIDIA#156: CREDENTIAL_FIELDS set +
  CREDENTIAL_FIELD_PATTERN regex)
- Remove gateway config section (contains auth tokens) from sandbox
  openclaw.json
- Blueprint digest verification (SHA-256): recorded at snapshot time,
  validated on restore, empty/missing digest is a hard failure
- computeFileDigest() throws when blueprint file is missing instead of
  silently returning null
- Sanitize both snapshot-level and sandbox-bundle openclaw.json copies
- Backward compatible: old snapshots without blueprintDigest skip
  validation
- Bump SNAPSHOT_VERSION 2 → 3

Supersedes NVIDIA#156 and NVIDIA#743.

* fix(sandbox): export proxy env vars with full NO_PROXY and persist across reconnects (NVIDIA#1025)

* fix(sandbox): export proxy env vars with full NO_PROXY and persist across reconnects

OpenShell injects NO_PROXY=127.0.0.1,localhost,::1 into the sandbox, missing
inference.local and the gateway IP (10.200.0.1). This causes LLM inference
requests to route through the egress proxy instead of going direct, and the
proxy gateway IP itself gets proxied.

Add proxy configuration block to nemoclaw-start.sh that:
- Exports HTTP_PROXY, HTTPS_PROXY, and NO_PROXY with inference.local and
  the gateway IP included
- Persists via /etc/profile.d/nemoclaw-proxy.sh (root) or ~/.profile
  (non-root fallback) so values survive OpenShell reconnect injection
- Supports NEMOCLAW_PROXY_HOST / NEMOCLAW_PROXY_PORT overrides

The non-root fallback ensures the fix works in environments like Brev where
containers run without root privileges.

Tested on DGX Spark (ARM64) and Brev VM (x86_64). Verified NO_PROXY contains
inference.local and 10.200.0.1 inside the live sandbox after connect.

Ref: NVIDIA#626, NVIDIA#704
Ref: NVIDIA#704 (comment)

* fix(sandbox): write proxy config to ~/.bashrc for interactive reconnect sessions

OpenShell's `sandbox connect` spawns `/bin/bash -i` (interactive, non-login),
which sources ~/.bashrc — not ~/.profile or /etc/profile.d/*.  The previous
approach wrote to ~/.profile and /etc/profile.d/, neither of which is sourced
by `bash -i`, so the narrow OpenShell-injected NO_PROXY persisted in live
interactive sessions.

Changes:
- Write proxy snippet to ~/.bashrc (primary) and ~/.profile (login fallback)
- Export both uppercase and lowercase proxy variants (NO_PROXY + no_proxy,
  HTTP_PROXY + http_proxy, etc.) — Node.js undici prefers lowercase no_proxy
  over uppercase NO_PROXY when both are set
- Add idempotency guard to prevent duplicate blocks on container restart
- Update tests: verify .bashrc writing, idempotency, bash -i override
  behavior, and lowercase variant correctness

Tested on DGX Spark (ARM64) and Brev VM (x86_64) with full destroy +
re-onboard + live `env | grep proxy` verification inside the sandbox shell
via `openshell sandbox connect`.

Ref: NVIDIA#626

* fix(sandbox): replace stale proxy values on restart with begin/end markers

Use begin/end markers in .bashrc/.profile proxy snippet so
_write_proxy_snippet replaces the block when PROXY_HOST/PORT change
instead of silently keeping stale values. Adds test coverage for the
replacement path.

Addresses CodeRabbit review feedback on idempotency gap.

* fix(sandbox): resolve sandbox user home dynamically when running as root

When the entrypoint runs as root, $HOME is /root — the proxy snippet
was written to /root/.bashrc instead of the sandbox user's home.
Use getent passwd to look up the sandbox user's home when running as
UID 0; fall back to /sandbox if the user entry is missing.

Addresses CodeRabbit review feedback on _SANDBOX_HOME resolution.

---------

Co-authored-by: Carlos Villela <cvillela@nvidia.com>

* fix(policies): preset application for versionless policies (Fixes NVIDIA#35) (NVIDIA#101)

* fix(policies): allow preset application for versionless policies (Fixes NVIDIA#35)

Fixes NVIDIA#35

Signed-off-by: Deepak Jain <deepujain@gmail.com>

* fix: remove stale complexity suppression in policies

---------

Signed-off-by: Deepak Jain <deepujain@gmail.com>
Co-authored-by: Kevin Jones <kejones@nvidia.com>

* fix: restore routed inference and connect UX (NVIDIA#1037)

* fix: restore routed inference and connect UX

* fix: simplify detected local inference hint

* fix: remove stale local inference hint

* test: relax connect forward assertion

---------

Signed-off-by: peteryuqin <peter.yuqin@gmail.com>
Signed-off-by: Deepak Jain <deepujain@gmail.com>
Co-authored-by: KJ <kejones@nvidia.com>
Co-authored-by: Emily Wilkins <80470879+epwilkins@users.noreply.github.com>
Co-authored-by: Carlos Villela <cvillela@nvidia.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Peter <peter.yuqin@gmail.com>
Co-authored-by: Brandon Pelfrey <bpelfrey@nvidia.com>
Co-authored-by: Benedikt Schackenberg <6381261+BenediktSchackenberg@users.noreply.github.com>
Co-authored-by: Lucas Wang <lucas_wang@lucas-futures.com>
Co-authored-by: senthilr-nv <senthilr@nvidia.com>
Co-authored-by: Deepak Jain <deepujain@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
ksapru pushed a commit to ksapru/NemoClaw that referenced this pull request Mar 30, 2026
… blueprint digest (NVIDIA#769)

Reconciles NVIDIA#156 and NVIDIA#743 into a single comprehensive solution:

- Filter auth-profiles.json at copy time via cpSync filter (from NVIDIA#743)
- Recursive stripCredentials() with pattern-based field detection for
  deep config sanitization (from NVIDIA#156: CREDENTIAL_FIELDS set +
  CREDENTIAL_FIELD_PATTERN regex)
- Remove gateway config section (contains auth tokens) from sandbox
  openclaw.json
- Blueprint digest verification (SHA-256): recorded at snapshot time,
  validated on restore, empty/missing digest is a hard failure
- computeFileDigest() throws when blueprint file is missing instead of
  silently returning null
- Sanitize both snapshot-level and sandbox-bundle openclaw.json copies
- Backward compatible: old snapshots without blueprintDigest skip
  validation
- Bump SNAPSHOT_VERSION 2 → 3

Supersedes NVIDIA#156 and NVIDIA#743.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

security Something isn't secure

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants