Skip to content

Latest commit

 

History

History
203 lines (153 loc) · 7.99 KB

File metadata and controls

203 lines (153 loc) · 7.99 KB

Sync Algorithm

How Frostbyte keeps every client's <video> element at the same playback position, reacting to controller events in lockstep.

The design is textbook lockstep netcode adapted for video: authoritative server clock, client-side clock offset estimation, scheduled broadcasts with a fixed lead time, and continuous drift correction.

Requirements

  • When a controller pauses or seeks, every client should apply the change within roughly one video frame of each other (~16ms at 60fps, ~40ms at 24fps). Good enough that nobody sees a reaction shot before the play happens.
  • The system must tolerate asymmetric WebSocket latency (20ms vs 200ms between clients and server).
  • Clients that join late must catch up seamlessly without seeing the video jump or stutter.
  • Clients that fall behind or get ahead due to browser hiccups must converge back to the authoritative timeline without visible seeking.

The authoritative session clock

The server holds one source of truth per room:

Session {
  paused          : bool
  position_ms     : integer    # where the video head was at updated_at
  rate            : float      # 1.0 normal, 0.5 half speed, etc.
  updated_at      : integer    # server wall-clock ms at which the above was true
}

Given the session and the current server time now, you can project the expected video position:

if paused:
  expected = position_ms
else:
  expected = position_ms + (now - updated_at) * rate

This projection is pure. Both server and clients use the same function.

Clock offset estimation (client ↔ server)

The client doesn't know the server's wall clock, so it can't convert execute_at_server_ms into a local time directly. It has to learn the offset.

The client runs an NTP-style handshake over the time_sync channel event:

t0 = local wall-clock ms before send
send { client_time_ms: t0 }
receive { client_time_ms: t0, server_time_ms: s }
t1 = local wall-clock ms after receive

From these three numbers:

rtt    = t1 - t0
offset = s - (t0 + rtt / 2)    # assumes symmetric latency

offset is the signed difference such that server_time ≈ local_time + offset.

One sample is noisy. The client keeps a rolling window of the last N samples (say N = 8), discards outliers (samples with RTT significantly above the median), and takes the median offset. Standard SNTP-ish smoothing.

Initial sync runs 4-5 rapid time_sync events in the first second after connect so the client has a stable offset before the first playback event arrives. After that it drops to once every 30 seconds for maintenance.

Scheduled broadcasts with lead time

When a controller pauses, the server receives:

state_change { action: "pause", position_ms: 48230, client_time_ms: 1744329650000 }

The server:

  1. Knows the controller's clock offset (measured during their own time_sync).
  2. Uses that offset to reconstruct the controller's intended server time for the action: intended_server_time = client_time_ms + controller_offset.
  3. Updates the session: paused=true, position_ms=48230, updated_at=intended_server_time.
  4. Computes execute_at_server_ms = now_server + LEAD_TIME, where LEAD_TIME is a constant buffer (default 200ms) chosen to exceed typical WS RTT.
  5. Broadcasts state_broadcast to every peer in the room.

Each client receives the broadcast, converts execute_at_server_ms to local time using its own offset:

execute_at_local = execute_at_server_ms - offset

Then it schedules a setTimeout(apply_state, execute_at_local - Date.now()). When the timer fires, apply_state does the actual video.pause() or video.currentTime = seek_target.

Because every client is scheduling against the same absolute server time and they've all estimated that time to within ~5ms, the actions fire within one video frame of each other.

Why a fixed lead time and not variable

The lead time could in principle be computed per-broadcast as max(RTT) + margin. In practice, a fixed 200ms is simpler and works as long as:

  • 95th percentile WS RTT stays under 180ms (true for well-provisioned nodes)
  • We're willing to trade 200ms of responsiveness for guaranteed simultaneity

v1 uses 200ms. If measurements show we can shave it, we will.

Drift correction between broadcasts

Between state changes, clients play the video on their own and drift naturally due to:

  • Clock rate differences between browser timers and the server clock
  • Video element scheduling jitter
  • Browser throttling on inactive tabs
  • GPU / audio stack variations

Every client runs a drift check once per second:

expected_position = project_session(local_session_view, now_server)
actual_position   = video.currentTime * 1000
drift             = actual_position - expected_position
  • If |drift| < 50ms: do nothing, within noise floor.
  • If 50ms ≤ |drift| < 300ms: rate nudge. Set video.playbackRate to 1 + (−drift / 3000) for a few seconds, capped at [0.95, 1.05]. This gently pulls the client back without a visible seek. Restore to 1.0 (or the session rate) once drift is back under 50ms.
  • If |drift| ≥ 300ms: hard seek. Set video.currentTime = expected_position / 1000. User will see a tiny jump but the alternative is staying noticeably out of sync.

Drift correction is suspended while the controller is actively seeking or if the server has recently broadcast a state change (last 500ms).

Late join catch-up

A client that joins mid-stream receives the current session as part of the phx_join reply. It:

  1. Loads the specified content on the platform.
  2. Waits for a loadedmetadata event.
  3. Projects the session forward: position_target = project_session(session, now_server).
  4. Sets video.currentTime = position_target / 1000.
  5. If paused=false, calls video.play().

From that point on it participates in drift correction and broadcast handling normally. There's a brief window where the joining client is slightly behind (while it buffers) which drift correction handles.

Correctness invariants

  • Only the controller can mutate the session. Server validates role on every state_change.
  • The session updated_at is monotonic. If the server receives an out-of-order state_change whose reconstructed intended_server_time is older than the current updated_at, the server rejects it with rate_limited or a new stale_action code.
  • execute_at_server_ms is always strictly greater than now_server when broadcast.
  • Rate changes apply before position projection: if rate changes, the projection formula still holds because we only project forward from updated_at.

Live content caveat

All of the above assumes the content is VOD where currentTime is freely seekable and playbackRate is respected. For live streams (Twitch, YouTube Live), the provider's CDN introduces 2-15s of per-viewer latency variance that no amount of client-side sync can correct. For live content in v1, the algorithm still runs but the "sync" is best-effort and users should expect one or two seconds of cross-viewer drift that comes from the platform, not from us.

Constants

Gathered here for tuning:

name value notes
LEAD_TIME_MS 200 broadcast → execution buffer
DRIFT_CHECK_INTERVAL 1000 ms how often clients measure drift
DRIFT_NUDGE_THRESHOLD 50 ms below this, no correction
DRIFT_SEEK_THRESHOLD 300 ms above this, hard seek
RATE_NUDGE_MIN 0.95 playbackRate floor
RATE_NUDGE_MAX 1.05 playbackRate ceiling
TIME_SYNC_WINDOW 8 samples rolling window for offset median
TIME_SYNC_INITIAL 5 samples rapid pings on connect
TIME_SYNC_INTERVAL 30 s maintenance ping rate