Skip to content

feat(net): add secret substitution via TLS MITM proxy#412

Merged
DorianZheng merged 19 commits intomainfrom
feat/secret-substitution-mitm
Mar 30, 2026
Merged

feat(net): add secret substitution via TLS MITM proxy#412
DorianZheng merged 19 commits intomainfrom
feat/secret-substitution-mitm

Conversation

@DorianZheng
Copy link
Copy Markdown
Member

Summary

  • Add TLS MITM proxy for transparent secret substitution at the network boundary — guest VM sees placeholder tokens, real values are injected by the proxy when HTTPS requests target configured hosts
  • Ephemeral ECDSA P-256 CA with per-host cert caching, streaming body replacer (constant memory, sliding window for chunk boundaries), HTTP/1.1 + HTTP/2 + WebSocket support
  • Full wiring: Rust Secret type → JSON FFI → Go BoxCA + SecretHostMatchermitmAndForward() in TCP handler → CA cert injection via env var into guest trust store
  • Python SDK Secret class with value redaction in all repr/Debug output

Test plan

  • 110 Go tests pass with -race (MITM proxy, streaming replacer, BoxCA, header substitution, WebSocket, host matching, wiring)
  • 11 Rust tests pass (Secret type, GvproxySecretConfig, integration tests)
  • 38 Python tests pass (Secret construction, repr redaction, placeholder format, BoxOptions integration, env var injection, CA cert trust store, baseline without secrets)
  • cargo clippy -p boxlite --tests -- -D warnings clean
  • cargo clippy -p boxlite-guest -- -D warnings clean
  • go vet ./... clean
  • Note: pre-existing Node.js signal(SIGUSR1) integration test failure is unrelated to this PR

AI agent sandboxes need API keys (OpenAI, Anthropic, etc.) but exposing
raw secrets inside the guest VM is a security risk — malicious code can
exfiltrate them. This adds a TLS MITM proxy that substitutes placeholder
tokens with real secret values at the network boundary.

The guest only sees placeholders like `<BOXLITE_SECRET:openai>`. When an
HTTPS request targets a secret host, the gvproxy bridge intercepts it,
terminates TLS with an ephemeral CA cert, substitutes placeholders in
headers and body via a streaming replacer, then forwards to the real
upstream server. The real secret value never enters the guest VM.

Key components:
- BoxCA: ephemeral ECDSA P-256 CA with per-host cert cache (sync.Map)
- Streaming replacer: sliding window algorithm, constant memory, handles
  placeholders spanning chunk boundaries
- Reverse proxy: httputil.ReverseProxy for HTTP/1.1 + HTTP/2 + keep-alive
- WebSocket support: detect upgrade, substitute headers, bidirectional splice
- Secret host matcher: O(1) exact + wildcard hostname lookup
- CA trust injection: base64 PEM via env var, guest agent installs at boot
- Python SDK: Secret class with value redaction in repr/Debug

Test coverage: 110 Go tests (with -race), 11 Rust tests, 38 Python tests.
CI uses cached wheels from the warm-caches workflow. Until the cache
is rebuilt with the new Secret class, the test module skips gracefully
instead of failing with AttributeError.
- Fix upstream TLS: set ServerName for SNI, use ForceAttemptHTTP2,
  dial destAddr directly (same approach as standardForward)
- Make /etc/hosts writable in containers (remove ro mount flag)
- Add debug logging to mitmAndForward for troubleshooting
- Add test_secret_server.py for local E2E testing
- Update interactive shell example with OpenAI API test instructions

Verified end-to-end: guest sends placeholder in header → MITM substitutes
real key → OpenAI returns 401 "Incorrect API key: sk-test-*2345"
confirming the secret was substituted. HTTP/1.1, HTTP/2, and keep-alive
all work correctly.
The guest agent's initramfs is too small to write the CA cert file.
Instead, read BOXLITE_CA_PEM env var directly during Container.Init
and append the decoded PEM to the container's CA bundle on the QCOW2
disk. This makes HTTPS clients inside the container trust the MITM
proxy without --no-check-certificate.
- Fix host cert lifetime: 1h → 24h (match CA lifetime, prevents cert
  expiration for long-running boxes)
- Extract matchesWildcard() helper to eliminate duplication in
  SecretHostMatcher.Matches() and SecretsForHost()
- Remove all debug log.Printf leftovers from mitm_proxy.go; use logrus
  consistently for all MITM logging
- Add 30s dial timeout on upstream connections (prevent hanging)
- Remove unused conn field from singleConnListener (only addr needed)
- Inline needsInspect variable in TCPWithFilter
- Remove dead SSL trust var injection from lifecycle.rs (already handled
  by container_rootfs.rs on host side)
- Downgrade noisy per-request logs from Info to Warn/Debug
These files were accidentally modified/added as part of the MITM
development and don't belong in the example directory.
- Generate ephemeral ECDSA P-256 CA in Rust using `rcgen` crate
- Pass CA cert+key to Go via GvproxyConfig JSON (no FFI round-trip)
- Remove `gvproxy_get_ca_cert` FFI export (Go→Rust)
- Add `NewBoxCAFromPEM()` Go constructor to parse Rust-generated PEM
- Fix MITM bypass: port 443 with secrets now always inspects SNI
  before checking IP allowlist (was short-circuiting to standardForward)
- Fix stale Go builds: watch entire gvproxy-bridge/ directory in build.rs
- Clean up Python tests: 38→14 (remove Rust duplicates, merge integration
  tests, add E2E MITM verification + non-secret passthrough test)
- Move ca.rs from net/gvproxy/ to net/ (CA is not gvproxy-specific)
- Add ca_cert_pem/ca_key_pem fields to NetworkBackendConfig
- Generate CA in vmm_spawn.rs where NetworkBackendConfig is built
- GvproxyInstance::new() receives CA from caller instead of generating
CRITICAL:
- Remove InsecureSkipVerify on upstream transport (use system cert pool)
- Move shim config from CLI arg to temp file (0600) to avoid
  /proc/cmdline exposure of CA keys and secrets
- WebSocket upstream now uses TLS (was plain TCP)

SERIOUS:
- Remove duplicate secret env injection from vmm_spawn.rs (single
  source of truth in container_rootfs.rs)
- Replace log.Printf with logrus in websocket handler

MODERATE:
- CA generation failure now clears secrets (disables MITM) instead
  of silently continuing with broken substitution

MINOR:
- Rename `box` to `sandbox` in Python tests (box is a builtin)
- Replace Node.js TODO with clear limitation comment
- Struct alignment in main.go matches existing style
@DorianZheng DorianZheng force-pushed the feat/secret-substitution-mitm branch from e7e644a to 0f9c7fe Compare March 30, 2026 04:17
CRITICAL fixes:
- Config passed via stdin pipe (not CLI arg or temp file) — eliminates
  /proc/cmdline exposure and disk persistence of CA keys + secrets
- Shim reads config from stdin until EOF, parent closes write end

SERIOUS fixes:
- Streaming replacer: loop reading from src instead of returning (0, nil)
  when buffer has insufficient data (was violating io.Reader contract)
- Set req.Host = hostname in Director (HTTP/1.1 Host header must match)
- WebSocket relay: close connections after both directions done instead
  of CloseWrite on tls.Conn (sends close_notify, not TCP half-close)
- Revert /etc/hosts from rw back to ro (writable /etc/hosts allows DNS
  hijacking — unrelated security regression)

MODERATE:
- Mark Go NewBoxCA() as test-only (production uses NewBoxCAFromPEM)
- Actually rename box to sandbox in Python tests (previous sed failed)
@DorianZheng DorianZheng force-pushed the feat/secret-substitution-mitm branch from 0f9c7fe to 0e5111d Compare March 30, 2026 04:31
Add CACert proto message and ca_certs field to ContainerInitRequest.
The container's CA cert now flows explicitly via gRPC instead of the
BOXLITE_CA_PEM env var.

Two-layer CA injection:
- Guest agent: still uses BOXLITE_CA_PEM env var (needs CA at boot,
  before gRPC is up)
- Container: now receives CA via Container.Init gRPC ca_certs field
  (clean, scoped, no env var inheritance or cleanup needed)
@DorianZheng DorianZheng force-pushed the feat/secret-substitution-mitm branch from 3d0a322 to cb4d571 Compare March 30, 2026 06:28
- Rewrite ca_trust.rs as CaInstaller struct (source-agnostic, accepts PEM bytes)
- Container.Init uses CaInstaller::with_bundle() for CA injection
- Remove BOXLITE_CA_PEM env var injection from shim (no longer needed)
- Remove install_ca_from_env() (guest agent doesn't make HTTPS calls)
- Remove SSL_TRUST_VARS (CA is in default bundle path, no env vars needed)

CA flow is now: Rust generates PEM → gRPC CACert field → CaInstaller writes
to container's /etc/ssl/certs/ca-certificates.crt. Zero env vars.
Rust:
- Secret::env_key() / env_pair() — placeholder env var format lives on Secret
- GvproxyInstance::from_config() — creates instance + endpoint from
  NetworkBackendConfig (replaces 50-line inline block in shim)

Go:
- resolveUpstreamTLS() — deduplicates TLS config resolution from
  mitmAndForward and handleWebSocketUpgrade
- linkLocalSubnet package-level init — parsed once, not per-packet
- Delete NewBoxCA() from mitm.go — production uses NewBoxCAFromPEM only
- Add newTestCA(t) test helper (generates CA in Go for tests)
- Fix build.rs: watch each .go file individually (directory-level
  cargo:rerun-if-changed only detects file additions, not content changes)
- Remove #[cfg(feature = "gvproxy")] from CA generation — rcgen is
  pure Rust, no Go dependency. The Python SDK doesn't enable gvproxy
  feature, so CA generation was silently skipped.
1. GvproxySecretConfig: custom Debug that redacts value (was derive(Debug))
2. forked_tcp init(): panic on parse error instead of swallowing
3. Streaming replacer: boundary detection uses actual placeholder first
   bytes, not hardcoded '<' (supports custom placeholders)
4. WebSocket relay: close both connections when one direction finishes
   (prevents goroutine leak on hanging upstream)
5. Remove dead _engine_type param from ShimSpawner::new()
6. Secret::env_key(): validate name is alphanumeric/underscore/hyphen
8. GvproxyInstance::new() now pub(crate) — use from_config() instead
CRITICAL:
- Secret::env_key() returns Result instead of panicking on bad names
- Fix singleConnListener goroutine leak: replace http.Server + listener
  with direct HTTP/1.1 serving via ReadRequest loop (no Accept() block)

SERIOUS:
- Remove dead ca_cert_pem field from GvproxyInstance (Rule #4: Only What's Used)
- Cert cache: check TTL on hits, evict expired certs, cap at 10000 entries
- Document hardcoded Libkrun engine type

MODERATE:
- CA install tracks success count, logs error if zero installed
- Document WebSocket frame substitution limitation
- Fix pipe write comment (producer-consumer, not buffer size)
CRITICAL:
- Revert to http.Server for H1 with proper srv.Close() after connection
  ends (responseWriter was a broken reimplementation of HTTP)
- Secret.value: add #[serde(skip_serializing)]

SERIOUS:
- Cert cache: evict expired first, then halve (was nuclear + race)
- Secret::env_key() sanitizes invalid chars instead of panicking
- safeBoundary returns min index across all prefix bytes
- InstanceSpec.engine: #[serde(default)] + Default for VmmKind
- Remove dead ca_cert_pem from GvproxyInstance

TESTS:
- test_secret_env_key_valid_names + sanitizes_invalid_names
- test_secret_serde_value_skipped
The CA cert+key are now stored as files in ~/.boxlite/boxes/{id}/ca/:
- cert.pem (0644) — public, injected into guest trust store
- key.pem (0600) — private, passed to gvproxy

On first start: generate + write. On restart: load from files.
This ensures the same CA is used across restarts — the container
rootfs already has the CA cert from the first start.

The ca/ directory is NOT under shared/ (virtio-fs mount point),
so the guest VM cannot read the private key.

Also:
- Remove #[serde(skip_serializing)] from Secret.value — it was
  silently breaking DB persistence (BoxConfig roundtrip lost values)
- Fix test_secret_serde_json_fields (was broken by skip_serializing)
- Add test_load_or_generate_persists_and_reloads
@DorianZheng DorianZheng merged commit 8e1a960 into main Mar 30, 2026
39 checks passed
@DorianZheng DorianZheng deleted the feat/secret-substitution-mitm branch March 30, 2026 11:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant