Skip to content

refactor(test): wire E2E security tests to real production code paths #1107

@jyaunches

Description

@jyaunches

Context

PR #1092 added E2E tests for credential sanitization and Telegram injection, but the tests reimplement production logic inline rather than calling the real code. Carlos flagged this in review — the tests can pass even if production regresses.

Problems

test/e2e/test-credential-sanitization.sh

  1. stripCredentials() + isCredentialField() reimplemented 3x (C1-C5, C12, C13) — copy-pastes CREDENTIAL_FIELDS, CREDENTIAL_FIELD_PATTERN, and both functions into node -e heredocs instead of importing from migration-state.ts.

  2. walkAndRemoveFile() reimplemented 2x (C1-C5, C8) — production uses copyDirectory() with a CREDENTIAL_SENSITIVE_BASENAMES filter. Test implements its own recursive walk.

  3. sanitizeConfigFile() behavior has drifted — production does delete config.gateway then stripCredentials(). The test does NOT delete gateway, it strips fields inside it. C4b (gateway.mode preserved) tests wrong behavior.

  4. verifyBlueprintDigest / verifyDigest reimplemented (C9-C11) — self-fulfilling tests that define their own verification logic, never calling production's computeFileDigest().

  5. Python dependency for JSON parsing (C3-C4) — uses python3 -c "import json..." instead of node -e.

test/e2e/test-telegram-injection.sh

  1. send_message_to_sandbox() is dead code — defined but never called. All tests use inline SSH.

  2. Tests bypass runAgentInSandbox() / shellQuote() — T1-T4, T8 use MSG=$(cat) && echo "$MSG" over SSH, a different code path than production's shellQuote() from bin/lib/runner.js. A regression in shellQuote() wouldn't be caught.

  3. sandbox_exec() still fails open — this copy wasn't updated with the fail-closed fix applied to the credential test.

Required changes

Production code

  • nemoclaw/src/commands/migration-state.ts: export isCredentialField, stripCredentials, sanitizeConfigFile, computeFileDigest, CREDENTIAL_FIELDS, CREDENTIAL_FIELD_PATTERN (or extract to a shared module).
  • scripts/telegram-bridge.js: export runAgentInSandbox for testability.

Test code

  • Replace all inline stripCredentials/walkAndRemoveFile/verifyDigest with require() of real code.
  • Fix sanitizeConfigFile drift (gateway deletion vs stripping).
  • Replace python3 JSON parsing with node.
  • Wire telegram injection tests through shellQuote() / runAgentInSandbox().
  • Remove or use send_message_to_sandbox() dead code.
  • Fix sandbox_exec() fail-closed in telegram test.

References

Metadata

Metadata

Assignees

Labels

enhancement: testingUse this label to identify requests to improve NemoClaw test coverage.refactorsecuritySomething isn't secure

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions