How Frostbyte keeps every client's <video> element at the same playback
position, reacting to controller events in lockstep.
The design is textbook lockstep netcode adapted for video: authoritative server clock, client-side clock offset estimation, scheduled broadcasts with a fixed lead time, and continuous drift correction.
- When a controller pauses or seeks, every client should apply the change within roughly one video frame of each other (~16ms at 60fps, ~40ms at 24fps). Good enough that nobody sees a reaction shot before the play happens.
- The system must tolerate asymmetric WebSocket latency (20ms vs 200ms between clients and server).
- Clients that join late must catch up seamlessly without seeing the video jump or stutter.
- Clients that fall behind or get ahead due to browser hiccups must converge back to the authoritative timeline without visible seeking.
The server holds one source of truth per room:
Session {
paused : bool
position_ms : integer # where the video head was at updated_at
rate : float # 1.0 normal, 0.5 half speed, etc.
updated_at : integer # server wall-clock ms at which the above was true
}
Given the session and the current server time now, you can project
the expected video position:
if paused:
expected = position_ms
else:
expected = position_ms + (now - updated_at) * rate
This projection is pure. Both server and clients use the same function.
The client doesn't know the server's wall clock, so it can't convert
execute_at_server_ms into a local time directly. It has to learn the
offset.
The client runs an NTP-style handshake over the time_sync channel event:
t0 = local wall-clock ms before send
send { client_time_ms: t0 }
receive { client_time_ms: t0, server_time_ms: s }
t1 = local wall-clock ms after receive
From these three numbers:
rtt = t1 - t0
offset = s - (t0 + rtt / 2) # assumes symmetric latency
offset is the signed difference such that server_time ≈ local_time + offset.
One sample is noisy. The client keeps a rolling window of the last N samples (say N = 8), discards outliers (samples with RTT significantly above the median), and takes the median offset. Standard SNTP-ish smoothing.
Initial sync runs 4-5 rapid time_sync events in the first second after connect so the client has a stable offset before the first playback event arrives. After that it drops to once every 30 seconds for maintenance.
When a controller pauses, the server receives:
state_change { action: "pause", position_ms: 48230, client_time_ms: 1744329650000 }
The server:
- Knows the controller's clock offset (measured during their own time_sync).
- Uses that offset to reconstruct the controller's intended server time for
the action:
intended_server_time = client_time_ms + controller_offset. - Updates the session:
paused=true, position_ms=48230, updated_at=intended_server_time. - Computes
execute_at_server_ms = now_server + LEAD_TIME, whereLEAD_TIMEis a constant buffer (default 200ms) chosen to exceed typical WS RTT. - Broadcasts
state_broadcastto every peer in the room.
Each client receives the broadcast, converts execute_at_server_ms to local
time using its own offset:
execute_at_local = execute_at_server_ms - offset
Then it schedules a setTimeout(apply_state, execute_at_local - Date.now()).
When the timer fires, apply_state does the actual video.pause() or
video.currentTime = seek_target.
Because every client is scheduling against the same absolute server time and they've all estimated that time to within ~5ms, the actions fire within one video frame of each other.
The lead time could in principle be computed per-broadcast as max(RTT) + margin.
In practice, a fixed 200ms is simpler and works as long as:
- 95th percentile WS RTT stays under 180ms (true for well-provisioned nodes)
- We're willing to trade 200ms of responsiveness for guaranteed simultaneity
v1 uses 200ms. If measurements show we can shave it, we will.
Between state changes, clients play the video on their own and drift naturally due to:
- Clock rate differences between browser timers and the server clock
- Video element scheduling jitter
- Browser throttling on inactive tabs
- GPU / audio stack variations
Every client runs a drift check once per second:
expected_position = project_session(local_session_view, now_server)
actual_position = video.currentTime * 1000
drift = actual_position - expected_position
- If
|drift| < 50ms: do nothing, within noise floor. - If
50ms ≤ |drift| < 300ms: rate nudge. Setvideo.playbackRateto1 + (−drift / 3000)for a few seconds, capped at[0.95, 1.05]. This gently pulls the client back without a visible seek. Restore to 1.0 (or the session rate) once drift is back under 50ms. - If
|drift| ≥ 300ms: hard seek. Setvideo.currentTime = expected_position / 1000. User will see a tiny jump but the alternative is staying noticeably out of sync.
Drift correction is suspended while the controller is actively seeking or if the server has recently broadcast a state change (last 500ms).
A client that joins mid-stream receives the current session as part of the phx_join reply. It:
- Loads the specified content on the platform.
- Waits for a
loadedmetadataevent. - Projects the session forward:
position_target = project_session(session, now_server). - Sets
video.currentTime = position_target / 1000. - If
paused=false, callsvideo.play().
From that point on it participates in drift correction and broadcast handling normally. There's a brief window where the joining client is slightly behind (while it buffers) which drift correction handles.
- Only the controller can mutate the session. Server validates role on every
state_change. - The session
updated_atis monotonic. If the server receives an out-of-orderstate_changewhose reconstructedintended_server_timeis older than the currentupdated_at, the server rejects it withrate_limitedor a newstale_actioncode. execute_at_server_msis always strictly greater thannow_serverwhen broadcast.- Rate changes apply before position projection: if rate changes, the projection
formula still holds because we only project forward from
updated_at.
All of the above assumes the content is VOD where currentTime is freely
seekable and playbackRate is respected. For live streams (Twitch, YouTube
Live), the provider's CDN introduces 2-15s of per-viewer latency variance
that no amount of client-side sync can correct. For live content in v1, the
algorithm still runs but the "sync" is best-effort and users should expect
one or two seconds of cross-viewer drift that comes from the platform, not
from us.
Gathered here for tuning:
| name | value | notes |
|---|---|---|
LEAD_TIME_MS |
200 | broadcast → execution buffer |
DRIFT_CHECK_INTERVAL |
1000 ms | how often clients measure drift |
DRIFT_NUDGE_THRESHOLD |
50 ms | below this, no correction |
DRIFT_SEEK_THRESHOLD |
300 ms | above this, hard seek |
RATE_NUDGE_MIN |
0.95 | playbackRate floor |
RATE_NUDGE_MAX |
1.05 | playbackRate ceiling |
TIME_SYNC_WINDOW |
8 samples | rolling window for offset median |
TIME_SYNC_INITIAL |
5 samples | rapid pings on connect |
TIME_SYNC_INTERVAL |
30 s | maintenance ping rate |