fix: restore cluster_pods network policy for nemoclaw and openclaw by drew · Pull Request #25 · NVIDIA/OpenShell-Community

drew · 2026-03-13T02:43:08Z

Summary

Restore the cluster_pods allowed_ips network policy that was accidentally removed from nemoclaw and openclaw in chore: align sandbox tooling and policies with upstream OpenShell #24.
This policy allows sandbox binaries to reach services on the k3s cluster pod network (10.42.0.0/16 on port 8080), which is required for internal service communication.

The cluster_pods allowed_ips policy was accidentally removed in #24. This policy allows sandbox binaries to reach services on the k3s cluster pod network (10.42.0.0/16), which is required for internal service communication.

…nclaw (#25)" This reverts commit 7335566.

…nclaw (#25)" (#26) This reverts commit 7335566.

## Summary Implements L7 (application-layer) protocol-aware policy enforcement for the sandbox proxy, enabling per-request allow/deny decisions based on HTTP method and path — not just host:port. ### Phase 1: HTTP/REST L7 Inspection (plaintext) - New `l7/` module with provider trait, REST parser, and relay loop - Parses HTTP/1.1 requests inside CONNECT tunnels, evaluates method+path against OPA/Rego policy - Supports `access` presets (`full`, `read-only`, `read-write`) and explicit `rules` with glob path matching - `enforcement` modes: `enforce` (deny with 403 JSON) vs `audit` (log only, forward traffic) - Structured logging for every L7 decision (`L7_REQUEST` with protocol, action, target, decision) ### Phase 2: MITM TLS Termination for HTTPS - Ephemeral CA generated at sandbox startup (`rcgen`) - Per-hostname leaf certificate cache (256 cap) for dynamic cert presentation - Client-side TLS termination (sandbox trusts ephemeral CA via `SSL_CERT_FILE`, `NODE_EXTRA_CA_CERTS`, etc.) - Upstream TLS connection with `webpki-roots` verification - Endpoints with `tls: terminate` get full L7 inspection; others remain passthrough ### Additional - Benign TLS/connection errors (close_notify, handshake EOF, reset) downgraded from WARN to DEBUG - Generic `AsyncRead + AsyncWrite` bounds throughout L7 module (works with both `TcpStream` and `TlsStream`) - Proto schema extended with `protocol`, `tls`, `enforcement`, `access`, `rules` fields on `NetworkEndpoint` - CLI `nav sandbox run` wired with `--rego-policy`/`--rego-data` flags and L7 policy display ### E2E Test Coverage - 7 L4 tests: no-matching-policy deny, wildcard binary, binary-restricted, wrong port, cross-policy isolation, non-CONNECT 405, structured log fields - 7 L7 TLS tests: full access allow, read-only deny POST, audit mode, explicit path rules, CA trust store injection, deny response JSON format, structured log fields ## Key Files | Area | Files | |------|-------| | L7 core | `l7/mod.rs`, `l7/provider.rs`, `l7/relay.rs`, `l7/rest.rs` | | TLS termination | `l7/tls.rs` | | Proxy integration | `proxy.rs`, `lib.rs`, `main.rs` | | Env/process | `process.rs`, `ssh.rs` | | OPA policy | `opa.rs`, `dev-sandbox-policy.rego` | | Proto schema | `proto/sandbox.proto` | | CLI | `navigator-cli/src/run.rs` | | E2E tests | `e2e/python/test_sandbox_policy.py` | ## Test plan - [x] 67 unit tests pass (`cargo test --workspace`) - [x] `cargo clippy --workspace --all-targets` clean - [x] 16 e2e policy tests pass (`mise run test:e2e:sandbox`) - [x] Manual smoke test: sandbox with `tls: terminate` on `api.anthropic.com:443` Closes NVIDIA#25

…Policy, and nonce replay detection (#127) * fix(security): harden sandbox SSH with mandatory HMAC secret, NetworkPolicy, and nonce replay detection Closes NVIDIA#25 - Make NEMOCLAW_SSH_HANDSHAKE_SECRET mandatory: server and sandbox both refuse to start if the secret is empty/unset. Cluster deployments auto-generate it via openssl rand in the entrypoint script. - Add Kubernetes NetworkPolicy restricting sandbox port 2222 ingress to the gateway pod only, preventing lateral movement from other cluster workloads. - Add NSSH1 nonce replay detection with a TTL-bounded cache, rejecting replayed handshakes within the timestamp validity window. - Add unit tests for verify_preface (valid, replay, expired, bad HMAC, malformed) and env injection. * fix(deploy): pass sshHandshakeSecret in fast deploy helm upgrade --------- Co-authored-by: John Myers <johntmyers@users.noreply.github.com>

drew merged commit 7335566 into main Mar 13, 2026
5 checks passed

drew added a commit that referenced this pull request Mar 13, 2026

Revert "fix: restore cluster_pods network policy for nemoclaw and ope…

603410e

…nclaw (#25)" This reverts commit 7335566.

drew mentioned this pull request Mar 13, 2026

Revert "fix: restore cluster_pods network policy for nemoclaw and openclaw" #26

Merged

drew added a commit that referenced this pull request Mar 13, 2026

Revert "fix: restore cluster_pods network policy for nemoclaw and ope…

764f9c9

…nclaw (#25)" (#26) This reverts commit 7335566.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: restore cluster_pods network policy for nemoclaw and openclaw#25

fix: restore cluster_pods network policy for nemoclaw and openclaw#25
drew merged 1 commit intomainfrom
fix/restore-cluster-pods-policy

drew commented Mar 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

drew commented Mar 13, 2026

Summary

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant