Skip to content

Epic: Cross-cluster physics interaction — kinematic proxies, imperative-op routing, atomic authority transfer #127

@martinjms

Description

@martinjms

Type: epic / interface and implementation plan. Closes the architectural gap flagged in #122"Cross-cluster physics interaction: 🚫 Blocked on cross-cluster physics epic (not yet filed)" — and unblocks the cross-cluster behavior in brainy-bots/arcane-demos#6.

Quick Summary

  • 🟡 Status: design captured, not yet implemented. Builds on #117 / #118 / #120 / #121 — the local Rapier integration is complete; this epic is about what happens at cluster boundaries.
  • Headline framing: cross-cluster physics is the graceful-degradation fallback for when affinity clustering can't co-locate interacting entities (cluster at capacity, prediction failure, hard pinning conflicts). The primary mechanism for keeping interacting entities together is — and remains — affinity clustering itself.
  • Three core layers: (1) kinematic proxies for neighbor entities, (2) cross-cluster imperative-op routing via Redis, (3) atomic authority transfer via the wire-format cluster_id.
  • ⚠️ Authority is ephemeral. It lives in the wire format, never in SpacetimeDB. Dropped on cluster release. Atomic switch — no in-flight state.
  • ⚠️ Cross-cluster joints are out of scope. Joints are inherently continuous-feedback interactions; affinity clustering is the right answer (the participants must be co-located). create_joint returns None cross-cluster; affinity-side improvements to predict joint candidates are tracked separately.
  • ⚠️ Genre-specific optimizations (e.g. deterministic-projectile fast path for shooters) are out of scope. Those layer on top of this core via forthcoming genre-primitive libraries, when concrete customer needs surface.

Why This Matters

Today, when entities owned by different clusters are simultaneously visible to each other, each cluster receives the other's pose via Redis replication — but the receiving cluster does nothing physics-aware with it. As a result:

  • Projectiles silently pass through neighbor entities. A bullet fired by player A (cluster 1) literally has no collider to interact with for player B (cluster 2). Visually they "pass through" because B's body doesn't exist in cluster 1's Rapier world.
  • Explosions stop at cluster boundaries. An AoE blast in cluster 1 can't push entities owned by cluster 2 — there's no mechanism to deliver the impulse.
  • Contact events are silently dropped at boundaries. Documented in rapier_cluster.rs — when a body is removed (e.g. authority changed), contacts terminate silently.

Affinity clustering is supposed to handle the common case by co-locating interacting entities into the same cluster. But:

  • Capacity edges: when a cluster is at fleet-defined capacity, new interacting entities can't be added — they have to live elsewhere.
  • Prediction failures: affinity is predictive; it can be wrong. Two players who suddenly start interacting may need a tick or two of cross-cluster behavior before migration completes.
  • Long-range sparse interactions: two players who are normally separate but occasionally interact (long-range deterministic projectiles, AoE radius events) shouldn't trigger affinity migration — that would create absurd sprawling clusters. Cross-cluster physics is the correct answer for these cases. Specific genres (shooters with sniper rifles, RTS with artillery) will layer their own optimized primitives on top of the core mechanism this epic delivers.

Without this epic, Arcane is single-cluster-equivalent for any physics-driven interaction. The whole multi-cluster value prop has a hole.

Two regimes — when each applies

Interaction shape Affinity's job Cross-cluster physics' job
Continuous — joints, ongoing contacts, vehicles, ragdolls (tight per-tick coupling) Primary — co-locate participants. Joints prohibited cross-cluster. None.
Discrete — projectiles, AoE, hitscan, area triggers (one-shot or low-frequency) Hands off — don't migrate participants just because of these. Primary — handle via kinematic proxies + impulse routing.
Overflow — capacity exceeded, prediction failed Tries to co-locate but can't Fallback — graceful degradation via cross-cluster path until affinity catches up

Affinity-clustering tuning to handle the second row (don't optimize for discrete interactions) is a sibling design concern — separate epic. This epic provides the mechanism; the clustering side decides when to trigger it.

Architectural pillars (must respect)

This epic is constrained by existing Arcane architecture. The design has been shaped to fit, not to fight:

  • Per-cluster simulation authority — no shared physics worlds across clusters; each cluster ticks its own. (physics-backends-and-unreal.md §1; ADR-001.)
  • Redis-vs-SpacetimeDB split — real-time replication on Redis (buckets 1–2); durable game outcomes via SpacetimeDB reducers (bucket 4). Cross-cluster physics events are real-time intent → Redis. Game outcomes (damage, death, status) → SpacetimeDB. (entity-model.md §7; project_redis_vs_spacetimedb_split.md in memory.)
  • Per-engine API discipline — this epic delivers the Rapier-Rust shape. Unreal-Chaos cross-cluster physics is parallel work in #124. Same conceptual contract, different language. (entity-model.md §8.)
  • Entity-keyed invariant — every body is tied to an entity_id. No off-spine bodies, no raw Rapier handles in user code. (ADR-001 §"Consequences"; entity-model.md §8.)
  • Observation radius + spread_radius for neighbor discovery — already wired through IReplicationChannel + arcane-spatial. Cross-cluster physics inherits this — no new neighbor-discovery mechanism. (interface-ireplicationchannel.md §5.)
  • Four-bucket entity state model — pose-only proxies use buckets 1 (spine: position, velocity, cluster_id) and optionally 2 (replicated user_data). No bucket-3 (local_data) or bucket-4 (durable) leakage across clusters. (four-bucket-state-model.md.)

The three layers

Layer 1 — Kinematic proxies (read-only world view)

When IReplicationChannel reports a neighbor entity for the first time, the receiving cluster auto-spawns a kinematic body in its Rapier world at the entity's pose. Subsequent Redis pose deltas update the body's translation (and linvel, if proxy mode is KinematicVelocityBased). Despawn when the entity drops out of subscription.

Per-game configurable proxy mode — progressive-API ladder:

Mode Cost per proxy per tick Use when
Fixed (default) Lowest (solver-skipped, AABB only) Proxies just need to exist for raycast/intersection queries; small visual stutter at low replication rates is acceptable
KinematicVelocityBased Slightly higher (interpolates) Proxies need to be smooth between Redis updates — e.g. when raycast precision matters mid-tick
Custom User-defined Genre-primitive layered on top

Proxies participate naturally in PhysicsHandle::raycast and intersections_with_shape — the existing #121 API surface keeps working unchanged.

Layer 2 — Cross-cluster imperative-op routing (real-time write-back)

When PhysicsHandle::apply_impulse(id, vec) (or any other write op: apply_force, apply_torque_impulse, set_translation, set_linvel, set_angvel, wake, sleep) is called on a non-authoritative entity, the operation is queued for routing. At end-of-tick, the cluster batches all queued cross-cluster ops by destination cluster and publishes them on a new Redis channel physics_events:<authority_cluster_id>.

The authority cluster receives the messages on its inbound channel, applies them at the start of its next tick (just before on_tick runs), and the local Rapier step incorporates them naturally.

Contact events flow back the same way. When the local physics step generates a ContactEvent involving a kinematic proxy entity (entity_a is local, entity_b is a proxy of remote), the cluster publishes the event on the remote authority's physics_events channel. The remote cluster surfaces it via its normal ctx.contact_events so gameplay code on both sides observes the contact.

Reliability: best-effort, fire-and-forget (V1). Same model as existing Redis replication. Cross-cluster physics tolerates occasional dropped events the same way networked physics generally does. At-least-once delivery with sequence numbers is a future improvement, called out in the ladder.

Layer 3 — Atomic authority transfer (entity migration)

When PGP affinity decides an entity should change authority from cluster A to cluster B:

  1. The wire-format cluster_id field on the entity flips to B's id (one tick).
  2. A's next tick observes the change → despawns its authoritative body.
  3. B's next tick observes the change → its existing kinematic proxy is promoted to a full authoritative body (same Rapier handle reused if possible, otherwise spawn fresh from the replicated pose + velocity).
  4. Other neighbor clusters observing the entity flip their internal "this is a proxy of A" → "this is a proxy of B" state.

No in-flight state needed. All neighbor clusters were already replicating the entity's state — they don't need a handoff payload. The cluster_id flip is a one-bit signal, atomic on the wire.

Hysteresis is handled upstream by PGP scoring. No new anti-thrash mechanism in this layer. (interface-iclusteringmodel.md defines PGP scoring; affinity-side improvements track joint candidates and other co-location predictors separately.)

Optional future enhancement: a second wire-format field next_authority_cluster_id for pre-announcing upcoming transfers. Lets receivers prepare predictively. Out of scope for V1.

Scope

In:

  • The three layers above, V1 specs.
  • Configurable proxy body kind (Fixed default, KinematicVelocityBased opt-in).
  • All PhysicsHandle write ops route cross-cluster (impulse / force / torque / set_translation / set_linvel / set_angvel / wake / sleep). Read ops (linvel, angvel) hit the local proxy directly — no routing.
  • physics_events:<cluster_id> Redis channel for queued ops + cross-cluster contact events.
  • Wire-format authority field — verify EntityStateEntry::cluster_id is sufficient; document semantics; ensure flip-on-migration is atomic.
  • create_joint returns None when the two entities have different authority clusters; document that affinity should pre-cluster joint participants.
  • Acceptance tests covering: raycast against proxy, impulse routed to remote authority, contact event flowed back, atomic authority transfer, joint refusal, configurable proxy mode.

Out (separate issues / epics):

  • Genre-specific optimizations (deterministic-projectile fast path for shooters, predicted-trajectory replication, etc.). Layered on top of this core via forthcoming genre-primitive libraries. Will be filed when concrete customer needs surface.
  • Cross-cluster joints / multibody articulations. Affinity clustering must co-locate joint participants; not in this epic's surface.
  • At-least-once delivery with sequence numbers for cross-cluster ops. V1 is best-effort; reliability ladder added when first customer hits the edge case.
  • Distributed solver (multiple clusters running iterative solver passes against shared bodies). Not on Arcane's roadmap; explicit non-goal.
  • Cross-cluster terrain consistency (raycasts across chunk boundaries owned by different clusters). Tracked in #119 (Terrain epic).
  • Predictive next-authority field (next_authority_cluster_id). Optional V1.5 enhancement.
  • Unreal-Chaos parallel implementation. Tracked in #124 (Unreal Cluster Node).
  • Affinity-clustering improvements to predict joint candidates / discrete-interaction exclusions. Sibling design concern; separate epic.

Implementation tasks (tracked here as a checklist; not filed as separate sub-issues until this epic is approved)

  • Wire-format cluster_id semantics. Verify it carries the right authority semantics today; document that it is the source of truth for authority; pin the atomic-flip contract.
  • Kinematic proxy spawning. In RapierClusterSim::on_tick, when an entity in ctx.entities is owned by a different cluster (entry.cluster_id != self.cluster_id), spawn a Fixed (default) or KinematicVelocityBased (configurable) body at its pose instead of a Dynamic body.
  • Proxy pose updates. Per-tick: for each proxy, set its body's translation to entry.position (and linvel for KinematicVelocityBased).
  • Proxy despawn lifecycle. When an entity disappears from ctx.entities (drops out of subscription), despawn its proxy body — same as the existing despawn_missing flow.
  • Authority-change detection. When an entity's cluster_id flips between ticks: if it became local, promote proxy → authoritative body; if it became remote, demote authoritative body → proxy.
  • Cross-cluster routing target resolution. PhysicsHandle looks up ctx.entities.get(&id)?.cluster_id to decide local vs route.
  • Routing queue. New field on RapierState (or on PhysicsHandle): pending_outbound_ops: HashMap<Uuid /* destination cluster */, Vec<OutboundPhysicsOp>>. Queued during on_tick, drained at end-of-tick.
  • Outbound publish. End-of-tick, batch by destination cluster and publish on physics_events:<dest> Redis channel.
  • Inbound subscribe. Cluster subscribes to physics_events:<self_cluster_id> on startup. Inbound ops are queued and applied just before on_tick runs (or at the start of run_physics_phase, whichever is cleaner).
  • OutboundPhysicsOp enum. Variants for each routed op (Impulse, Force, TorqueImpulse, SetTranslation, SetLinvel, SetAngvel, Wake, Sleep). Postcard-encoded for the Redis channel.
  • Cross-cluster contact event publish. When pending_contact_events contains an event with one local + one proxy entity, publish to the proxy's authority cluster.
  • Joint cross-cluster prohibition. PhysicsHandle::create_joint returns None when a.cluster_id != self.cluster_id || b.cluster_id != self.cluster_id.
  • Configurable proxy mode. Add field to RapierConfig (e.g. proxy_mode: ProxyMode { Fixed, KinematicVelocityBased }). Default Fixed.
  • Acceptance tests:
    • Two-cluster setup: entity in cluster A; cluster B's PhysicsHandle::raycast hits the proxy.
    • Impulse on remote entity → routed via Redis → applied on authority cluster's next tick → linvel changed.
    • Contact between local body and remote-proxy in cluster A → contact event observed on cluster B's next tick.
    • Atomic authority transfer: cluster_id flip → A despawns body, B promotes proxy to body.
    • create_joint cross-cluster returns None.
    • KinematicVelocityBased proxy mode produces smoother interpolation than Fixed between Redis updates.
  • Module docs updated with the cross-cluster architecture overview, proxy lifecycle, routing semantics.
  • Re-export new public types from arcane_infra crate root (e.g. ProxyMode).
  • Clippy clean both feature configurations.

Decisions to make before sub-issues are filed

  • Routing message encoding. Postcard (matches arcane-wire)? Custom binary? JSON for ease of debug? Lean: postcard.
  • Inbound op application timing. At the start of run_physics_phase (after on_tick), or just before on_tick so user code can react in the same tick? Lean: start of run_physics_phase (mirrors how spawn-loop applies state).
  • Redis channel naming + sharding. Per-cluster channel physics_events:<cluster_uuid> works for V1. Consider grouping into a smaller channel set if Redis pub/sub pattern matching becomes a bottleneck — defer.
  • Proxy promotion (Fixed/Kinematic → Dynamic) on authority change. Reuse Rapier handle (mutate body kind in place) if Rapier 0.32 supports it cleanly, else despawn + respawn. Investigate during implementation.
  • Tick-ordering hazard on the cluster_id flip. The single-field atomic flip is the design (no in-flight state — closed). What's still open: the implementation question of whether there's a tick-ordering hazard (e.g. cluster sees the flip on tick N but doesn't have the body's tick-N pose yet because it arrived in a later replication batch). If so, the fix is a small adjustment (buffer one tick), not a design change.
  • Per-cluster cluster_id discovery. RapierClusterSim needs to know its own cluster id to compare against entry.cluster_id. Currently lives in ClusterTickContext::cluster_id per tick — confirm it's stable across ticks and accessible at lock-acquisition time.

Reference


Implementation order (suggested):

  1. Wire-format authority semantics + atomic flip (foundation for everything else)
  2. Kinematic proxy spawning + lifecycle (Layer 1 — gets raycast against proxies working)
  3. Cross-cluster imperative-op routing + contact event flow-back (Layer 2 — enables explosions / hitscan)
  4. Authority transfer (Layer 3 — enables PGP-driven migrations to actually work physics-side)
  5. Configurable proxy mode + tests
  6. Documentation

Each step is independently testable and useful. Demo arcane-demos#6 becomes meaningful as soon as steps 1–3 land.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions