Skip to content

fix: restore cluster_pods network policy for nemoclaw and openclaw#25

Merged
drew merged 1 commit intomainfrom
fix/restore-cluster-pods-policy
Mar 13, 2026
Merged

fix: restore cluster_pods network policy for nemoclaw and openclaw#25
drew merged 1 commit intomainfrom
fix/restore-cluster-pods-policy

Conversation

@drew
Copy link
Copy Markdown
Collaborator

@drew drew commented Mar 13, 2026

Summary

  • Restore the cluster_pods allowed_ips network policy that was accidentally removed from nemoclaw and openclaw in chore: align sandbox tooling and policies with upstream OpenShell #24.
  • This policy allows sandbox binaries to reach services on the k3s cluster pod network (10.42.0.0/16 on port 8080), which is required for internal service communication.

The cluster_pods allowed_ips policy was accidentally removed in #24.
This policy allows sandbox binaries to reach services on the k3s
cluster pod network (10.42.0.0/16), which is required for internal
service communication.
@drew drew merged commit 7335566 into main Mar 13, 2026
5 checks passed
drew added a commit that referenced this pull request Mar 13, 2026
drew added a commit that referenced this pull request Mar 13, 2026
factory-octavian pushed a commit to factory-octavian/OpenShell-Community that referenced this pull request Apr 1, 2026
## Summary

Implements L7 (application-layer) protocol-aware policy enforcement for the sandbox proxy, enabling per-request allow/deny decisions based on HTTP method and path — not just host:port.

### Phase 1: HTTP/REST L7 Inspection (plaintext)
- New `l7/` module with provider trait, REST parser, and relay loop
- Parses HTTP/1.1 requests inside CONNECT tunnels, evaluates method+path against OPA/Rego policy
- Supports `access` presets (`full`, `read-only`, `read-write`) and explicit `rules` with glob path matching
- `enforcement` modes: `enforce` (deny with 403 JSON) vs `audit` (log only, forward traffic)
- Structured logging for every L7 decision (`L7_REQUEST` with protocol, action, target, decision)

### Phase 2: MITM TLS Termination for HTTPS
- Ephemeral CA generated at sandbox startup (`rcgen`)
- Per-hostname leaf certificate cache (256 cap) for dynamic cert presentation
- Client-side TLS termination (sandbox trusts ephemeral CA via `SSL_CERT_FILE`, `NODE_EXTRA_CA_CERTS`, etc.)
- Upstream TLS connection with `webpki-roots` verification
- Endpoints with `tls: terminate` get full L7 inspection; others remain passthrough

### Additional
- Benign TLS/connection errors (close_notify, handshake EOF, reset) downgraded from WARN to DEBUG
- Generic `AsyncRead + AsyncWrite` bounds throughout L7 module (works with both `TcpStream` and `TlsStream`)
- Proto schema extended with `protocol`, `tls`, `enforcement`, `access`, `rules` fields on `NetworkEndpoint`
- CLI `nav sandbox run` wired with `--rego-policy`/`--rego-data` flags and L7 policy display

### E2E Test Coverage
- 7 L4 tests: no-matching-policy deny, wildcard binary, binary-restricted, wrong port, cross-policy isolation, non-CONNECT 405, structured log fields
- 7 L7 TLS tests: full access allow, read-only deny POST, audit mode, explicit path rules, CA trust store injection, deny response JSON format, structured log fields

## Key Files

| Area | Files |
|------|-------|
| L7 core | `l7/mod.rs`, `l7/provider.rs`, `l7/relay.rs`, `l7/rest.rs` |
| TLS termination | `l7/tls.rs` |
| Proxy integration | `proxy.rs`, `lib.rs`, `main.rs` |
| Env/process | `process.rs`, `ssh.rs` |
| OPA policy | `opa.rs`, `dev-sandbox-policy.rego` |
| Proto schema | `proto/sandbox.proto` |
| CLI | `navigator-cli/src/run.rs` |
| E2E tests | `e2e/python/test_sandbox_policy.py` |

## Test plan

- [x] 67 unit tests pass (`cargo test --workspace`)
- [x] `cargo clippy --workspace --all-targets` clean
- [x] 16 e2e policy tests pass (`mise run test:e2e:sandbox`)
- [x] Manual smoke test: sandbox with `tls: terminate` on `api.anthropic.com:443`

Closes NVIDIA#25
factory-octavian pushed a commit to factory-octavian/OpenShell-Community that referenced this pull request Apr 1, 2026
…Policy, and nonce replay detection (#127)

* fix(security): harden sandbox SSH with mandatory HMAC secret, NetworkPolicy, and nonce replay detection

Closes NVIDIA#25

- Make NEMOCLAW_SSH_HANDSHAKE_SECRET mandatory: server and sandbox both
  refuse to start if the secret is empty/unset. Cluster deployments
  auto-generate it via openssl rand in the entrypoint script.
- Add Kubernetes NetworkPolicy restricting sandbox port 2222 ingress to
  the gateway pod only, preventing lateral movement from other cluster
  workloads.
- Add NSSH1 nonce replay detection with a TTL-bounded cache, rejecting
  replayed handshakes within the timestamp validity window.
- Add unit tests for verify_preface (valid, replay, expired, bad HMAC,
  malformed) and env injection.

* fix(deploy): pass sshHandshakeSecret in fast deploy helm upgrade

---------

Co-authored-by: John Myers <johntmyers@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant