Skip to content

Enable per-message-deflate WebSocket compression on cluster broadcast pipeline #44

@martinjms

Description

@martinjms

Summary

Turn on per-message-deflate (PMD) (RFC 7692) on the cluster's WebSocket server in ws_server.rs. PMD compresses each WebSocket message before it hits the wire and decompresses on the receiving side. With our existing per-broadcast encode pattern (Shape B — encode the broadcast frame once per tick, share the bytes across all subscribers), PMD's CPU cost gets amortized to once per broadcast — not once per subscriber — which makes it cheap relative to the bandwidth saved.

Empirical motivation: the 2026-04-26 clusters_4 benchmark run topped out at 7,250 players on c7i.2xlarge cluster nodes, with the cluster outbound NIC as the binding bottleneck — sustained ~3 Gbps per cluster while broadcast demand at the failing tier was ~9 GB/s. Cluster CPU still had 65% headroom at the ceiling. Reducing the bytes the cluster needs to push out is the most direct lift on the ceiling that doesn't require a hardware change or an architectural refactor; PMD is the cheapest such lift available.

Expected impact

  • Bandwidth: 30–50% reduction on broadcast frames. Postcard-encoded DeltaPayload carries lots of compressible structure: repeated UUIDs across entities, position/velocity floats spatially clustered, header fields that recur across frames. DEFLATE handles this well.
  • CPU: with the per-broadcast encode cache pattern, compression runs once per (cluster, tick) and the compressed bytes are reused for every subscriber. At 4 clusters × 20 Hz that's ~80 compressions/sec total across the deployment, not 80 × N_subscribers/sec. Negligible against the cluster's 65% tick-budget headroom.
  • Decompression runs per-subscriber on the client side. For our benchmark swarm-driver (which already shares the decoded result across simulated players via the per-frame decode cache from arcane_swarm/feat/latency-decomposition), decompression also amortizes via the same cache. For real game clients (one process per player), decompression is per-player — but a real client decompresses one ~70-200 KB frame every 50 ms, well within trivial CPU.
  • Ceiling: estimated +30-50% lift on the player-count ceiling for the same NIC bandwidth, given today's bottleneck shape. Subject to measurement.

Why this is "free" specifically for Arcane

PMD has been around since 2015 and most projects don't bother because the benefit-to-cost ratio is mediocre when you compress per-message-per-subscriber: the CPU cost scales with subscriber count, and the bandwidth saved isn't usually the bottleneck. Arcane's broadcast model inverts both: (a) encode-once-fan-out means compression cost is per-broadcast, not per-subscriber, and (b) NIC bandwidth IS the empirical bottleneck. The thing that makes it usually-not-worth-it makes it specifically worth it for us.

Implementation sketch

tokio-tungstenite (the WS library used by ws_server.rs and the swarm clients) supports per-message-deflate via the deflate feature flag. The negotiation is handshake-time: the cluster advertises support; clients accept it; both sides run DEFLATE on outgoing messages and INFLATE on incoming.

Specific touch points:

  1. arcane/crates/arcane-infra/Cargo.toml — enable tokio-tungstenite with the deflate feature.
  2. arcane/crates/arcane-infra/src/ws_server.rs — configure the server-side WebSocketConfig to advertise PMD on the handshake. Test that the negotiated extension shows up in the response headers.
  3. Swarm client (arcane_swarm/crates/arcane-swarm/src/bin/arcane_swarm/backends_arcane.rs) — same feature flag in the client connect path so PMD gets accepted and decompression happens automatically.
  4. Real-game clients — confirm UE5 and Unity native WebSocket bindings support PMD (browser WebSocket does natively). For initial rollout, both sides controlled by us; broader interop to be validated when the UE5 plugin work catches up.
  5. Per-broadcast encode cache — verify the existing Shape B encode path still benefits: compression should happen after the per-broadcast encode and the compressed bytes should be the unit shared across subscriber sends. Otherwise we'd be compressing N times per broadcast and lose the win.
  6. Configuration — add a cluster_ws_deflate_enabled field (default true) so we can A/B at benchmark time and so studios can disable it if their custom clients don't speak PMD.

Out of scope for this issue

  • Pluggable transport layer (QUIC, raw UDP). Tracked in arcane#43. PMD is a WebSocket-specific optimization that lives whether or not we eventually move to QUIC.
  • Wire-format-level compression (e.g. quantizing position+velocity from f32 to fixed-point). Independent optimization, would compose with PMD additively. Worth a separate issue.
  • Delta-only broadcasts (arcane#30). Independent optimization, also composes additively.

Acceptance criteria

  • PMD negotiated on every cluster WebSocket connection (verified via handshake response headers in an integration test).
  • Cluster runs the existing benchmark (full mesh, 5,750 → 7,500 player sweep on c7i.2xlarge clusters_4 fleet) with PMD on; benchmark journal entry compares ceiling and bandwidth against the 7,250-player baseline from 20260426_060905.
  • No change to the wire schema in arcane-wire — PMD is a transport-layer concern, the encoded ServerFrame::Delta bytes inside are unchanged.
  • Cluster CPU last_tick_us regression checked: we still have meaningful tick-budget headroom at the new ceiling, i.e. the compression cost didn't push us into a different bottleneck.
  • Configuration toggle (cluster_ws_deflate_enabled, default true) lets us A/B test and lets studios disable for clients that can't negotiate PMD.

Quick win next to it (out of this issue, but worth filing alongside)

Quantize position and velocity from Vec3<f32> (12 B each) to fixed-point or f16 (~6 B each). Drops per-entity state from 56 B → ~36 B (35% reduction) at the cluster end without changing the wire-schema fields, only their representation. Composes with PMD and is independent of it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions