A low-level multithreaded game engine built from scratch in C++20, targeting nanosecond-precision judgment timing and frame-accurate audiovisual synchronization for a rhythm game.
The engine implements a closed-loop feedback system that observes kernel-level VBlank interrupts, predicts the true display refresh period via Kalman filtering, and continuously modulates audio playback to maintain phase lock with the predicted VBlank phase — eliminating audio-video drift without audible artifacts.
USPTO Non-Provisional Patent Application #19/641,687 — Method and System for Audiovisual Synchronization and Render Latency Minimization via Hardware Clock Domain Bridging in Asynchronous Operating System Environments. Licensed under Apache 2.0, which includes an explicit patent grant to users of this software.
Rhythm games are uniquely sensitive to audio-video phase alignment. A 16ms frame interval means even a 1-frame drift is perceptually noticeable, and mainstream engines have developed various strategies to handle this — fixed-rate resampling, swap chain flip models, and VSync-based scheduling all work well in most contexts.
Delta Engine explores a different approach: treating the display refresh period as an unknown physical parameter to be estimated in real time, rather than a nominal constant to be assumed.
Two observations motivated this direction:
- OS-reported VBlank timing carries scheduling jitter (~5–10µs), which is small per-frame but can accumulate into audible drift over long play sessions.
- The audio DAC crystal and display crystal drift relative to each other, so even a perfectly-timed audio stream can fall out of phase with the display over time.
The engine addresses both by continuously estimating the true display refresh period via Kalman filtering and steering the audio stream to track it — a closed-loop approach that bounds drift rather than periodically correcting it.
The entire engine runs across three threads with zero mutexes on the hot path.
| Thread | Rate | Role |
|---|---|---|
| Logic Thread | 16,000 Hz | Input capture, judgment, scene state update |
| Render Thread | VBlank-gated | D3D11 command submission and Present |
| VBlank Observer | VBlank | D3DKMT kernel interrupt hook → Kalman filter update |
Data flows one-way between threads via lock-free atomic triple buffers. No lock, mutex, or blocking primitive exists on the per-frame path.
-
Kernel-Level VBlank Observer : Dedicated high-priority thread blocks on
D3DKMTWaitForVerticalBlankEvent(accessed viagdi32.dllfunction pointer resolution), capturing the hardware timestamp of each VBlank with sub-nanosecond resolution. -
1-D Kalman Filter : Raw VBlank intervals carry OS scheduling jitter (~5–10µs). A recursive state estimator (Q = 1e-9, R = 5e-6) converges to the true display crystal oscillator period within ~20–30 samples, yielding a clean predicted refresh phase.
-
Dynamic Audio Phase Lock : Each VBlank cycle computes a
sync_ratio = current_hz / target_hz, atomically published to the audio callback. A 4-point cubic interpolation resampler modulates playback speed by this ratio, continuously realigning audio phase with predicted VBlank — drift accumulation is bounded, not integrated. -
Beam Racing Scheduler : Render thread dispatch delayed until
clean_dt × 0.827after VBlank, using two-phase wait (OS sleep above 2ms,_mm_pause()spin below). Frame presentation completes just before the next scanline pass, minimizing input-to-photon latency.
-
Lock-Free Triple Buffering : Three
RenderSnapshotslots managed via atomic index swaps with acquire-release memory ordering. Logic thread never blocks on render; render thread always receives the latest completed snapshot. Total sync overhead: 3 atomic operations per frame exchange. -
Lock-Free SPSC Input Queue : Raw Input (WM_INPUT) events enqueued from the input capture thread, dequeued by the logic thread. No kernel-level synchronization on the input-to-logic path.
-
CAS Voice Allocation : 64-slot audio voice mixer with three atomic states (FREE / WRITING / ACTIVE) transitioned via compare-and-swap. No mutex in the audio callback.
-
RDTSC-Based Nanosecond Timer :
__rdtsc()serialized via_mm_lfence(), cross-calibrated againstQueryPerformanceCounter. All subsequent timing uses TSC directly, independent of OS timer services and invariant across P-states (SpeedStep / Turbo Boost). -
2GB Pre-Reserved Memory Arena :
VirtualAllocreservation at startup.NotePoolbinds directly into the arena as a flat SoA array — zero heap allocation during gameplay. -
SIMD Parallel Judgment :
NotePoollaid out as Structure-of-Arrays._mm256_cmp_pdevaluates 4 notes per instruction for timing window checks.[[likely]]/[[unlikely]]attributes guide branch prediction.
-
Lua Skin System :
.luaskinfiles executed twice — first pass injects engine config, second pass does full parse. Load-time static OP culling and pointer pre-linking eliminate runtime map lookups on the render path. -
BGA Video Decoding : Media Foundation decode thread hands frames to the render thread via a lock-free triple buffer. Render thread performs only
Map/Unmapfor GPU upload. -
BMS Format Parser : Zero-allocation parser using
std::from_charsandstd::string_view. MD5 hashing via Windows CryptoAPI for per-song caching in SQLite (WAL mode, prepared statements).
- Kernel-level interaction with the Windows graphics stack (D3DKMT, DXGI)
- Recursive state estimation (Kalman filtering) applied to a real hardware timing problem
- Lock-free concurrency with correct acquire-release memory ordering across multiple threads
- Closed-loop control system design in a soft real-time environment
- SIMD vectorization, cache-aware data layout, and zero-allocation hot paths
- End-to-end engine architecture: from kernel interrupts to GPU presentation
Delta_Engine/
├── 01_core/ — Precision clock, memory arena, thread primitives
├── 02_audio/ — Audio mixer, resampler, dynamic limiter, sync controller
├── 03_graphics/ — D3D11 render pipeline, VBlank observer, beam racing
├── 04_input/ — Raw Input capture, SPSC queue, input manager
├── 05_scene/ — Scene state, NotePool (SoA), snapshot write
├── 06_skin/ — Lua skin loader, blueprint parser
├── 07_bms/ — BMS file parser, chart data structures
├── 08_db/ — SQLite-backed song cache and record storage
├── 09_bga/ — BGA video decoder (Media Foundation)
└── 99_utils/ — Kalman filter, timing utilities, hash helpers
Requirements:
- Windows 10/11 x64
- Visual Studio 2022 (C++20)
- DirectX 11 SDK (included in Windows 10 SDK)
Dependencies (bundled in extern/):
- SQLite amalgamation
- Lua / sol2
- ASIO SDK headers
Build:
Open Delta_Engine.sln → Build → Release x64
Build output: Delta_Engine.exe + delta_record.db on first launch.
This software implements methods described in USPTO Non-Provisional Patent Application #19/641,687, filed [2026-04-08]. The patent application covers the closed-loop audiovisual synchronization architecture, including VBlank prediction via Kalman filtering, dynamic audio phase synchronization, and beam racing render scheduling.
The software is released under the Apache License 2.0, which includes an explicit patent grant to users of this software (Section 3 of the license). This means anyone using Delta Engine under the terms of Apache 2.0 automatically receives a royalty-free license to the patented methods as implemented in this codebase.
See LICENSE and NOTICE for full terms.
- Delta Cast — Kernel-level virtual ASIO driver (MIT)
- Delta HFT — Low-latency trading system reference (MIT)
Seungmin Lee — Systems & Engine Programmer 📧 preez@studiodelta.works 🔗 GitHub · Portfolio
Open to remote positions and relocation. Visa sponsorship required for positions outside South Korea.