Skip to content

fix(sandbox): relay WebSocket frames after HTTP 101 Switching Protocols#683

Open
davidpeden3 wants to merge 1 commit intoNVIDIA:mainfrom
davidpeden3:fix/websocket-101-upgrade-relay
Open

fix(sandbox): relay WebSocket frames after HTTP 101 Switching Protocols#683
davidpeden3 wants to merge 1 commit intoNVIDIA:mainfrom
davidpeden3:fix/websocket-101-upgrade-relay

Conversation

@davidpeden3
Copy link
Copy Markdown

@davidpeden3 davidpeden3 commented Mar 30, 2026

The L7 REST proxy treats 101 Switching Protocols as a generic 1xx informational response via is_bodiless_response(), forwarding the headers and returning to the HTTP parsing loop. After a 101, the connection has been upgraded (e.g. to WebSocket) and subsequent bytes are protocol frames, not HTTP requests. The relay loop either blocks or silently drops them.

This patch:

  • Adds RelayOutcome::Upgraded variant to signal protocol upgrades
  • Detects 101 responses before the generic 1xx handler in relay_response(), capturing any overflow bytes read past the headers
  • Switches relay_rest() and relay_passthrough_with_credentials() to raw bidirectional TCP copy (tokio::io::copy_bidirectional) after receiving an Upgraded outcome
  • Adds a test verifying 101 response handling and overflow capture

This enables WebSocket connections (OpenClaw node meshes, Discord/Slack bots) to work from inside fully sandboxed environments.

Fixes: #652
Related: NVIDIA/NemoClaw#409

Summary

The L7 proxy's relay loop did not handle HTTP 101 Switching Protocols. After the 101 response, the connection has been upgraded to a different protocol (e.g. WebSocket) but the proxy continued trying to parse HTTP, silently dropping all frames. This patch detects the 101, captures any overflow bytes, and switches to raw bidirectional TCP relay.

Related Issue

Changes

  • crates/openshell-sandbox/src/l7/relay.rs: Added RelayOutcome::Upgraded variant. Detect 101 before generic 1xx handling, capture overflow bytes read past the response headers.
  • crates/openshell-sandbox/src/l7/rest.rs: After relay_response() returns Upgraded, switch to tokio::io::copy_bidirectional for raw TCP relay between client and upstream.
  • crates/openshell-sandbox/src/l7/provider.rs: Same upgrade handling for relay_passthrough_with_credentials().

Testing

  • mise run pre-commit passes
  • Unit tests added/updated
  • E2E tests added/updated (if applicable)

Checklist

  • Follows Conventional Commits
  • Commits are signed off (DCO)
  • Architecture docs updated (if applicable)

The L7 REST proxy treats 101 Switching Protocols as a generic 1xx
informational response via is_bodiless_response(), forwarding the
headers and returning to the HTTP parsing loop. After a 101, the
connection has been upgraded (e.g. to WebSocket) and subsequent bytes
are protocol frames, not HTTP requests. The relay loop either blocks
or silently drops them.

This patch:
- Adds RelayOutcome::Upgraded variant to signal protocol upgrades
- Detects 101 responses before the generic 1xx handler in
  relay_response(), capturing any overflow bytes read past the headers
- Switches relay_rest() and relay_passthrough_with_credentials() to
  raw bidirectional TCP copy (tokio::io::copy_bidirectional) after
  receiving an Upgraded outcome
- Adds a test verifying 101 response handling and overflow capture

This enables WebSocket connections (OpenClaw node meshes, Discord/Slack
bots) to work from inside fully sandboxed environments.

Fixes: NVIDIA#652
Related: NVIDIA/NemoClaw#409

Signed-off-by: David Peden <davidpeden3@gmail.com>
@davidpeden3 davidpeden3 requested a review from a team as a code owner March 30, 2026 17:30
@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 30, 2026

All contributors have signed the DCO ✍️ ✅
Posted by the DCO Assistant Lite bot.

@davidpeden3
Copy link
Copy Markdown
Author

I have read the DCO document and I hereby sign the DCO.

@davidpeden3
Copy link
Copy Markdown
Author

recheck

@johntmyers
Copy link
Copy Markdown
Collaborator

Hi @davidpeden3. Thank you for this. Out of curiosity, we've seen a lot of issues and had feedback that getting Slack, Discord, et al working in OpenClaw in OpenShell does not work because they won't go through the proxy. Did you have to do anything specific to get these providers to obey the proxy to begin with (obviously post-proxy use is an issue you are addressing).

@davidpeden3
Copy link
Copy Markdown
Author

hey @johntmyers,

honestly i haven't even gotten that far in my setup. i've been working on creating a mesh network where my gateway can distribute work to my nodes (and itself, of course) for a karpathy-style autoresearch project i'm working on to train local models (currently trying out nemotron) to learn my coding style. i've got three machines in the network. an m4 mac studio 128gb, an m5 mbp 128gb, and a win pc with a 5090 rtx.

the nodes connect to the gateway via websockets. all nodes (including the gateway, of course) are running inside an openshell sandbox for security. the issue surfaced when i tried to pair nodes to the gateway (starting w/ the m4 to m5 connection). once i got that working, i then ran into #681 when trying to pair the 5090 pc to the m4 gateway.

both fixes solved all of my connectivity issues end to end.

so to be clear, i have not yet attempted to connect to a third party like slack. this was all internal communication between my nodes. pure openshell/openclaw. i will likely get to setting up slack later this week if you would like me to report back.

@johntmyers johntmyers self-assigned this Mar 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Egress proxy fails to relay WebSocket frames after successful HTTP CONNECT + 101 Switching Protocols

2 participants