-
Notifications
You must be signed in to change notification settings - Fork 94
Description
Summary
The Rive Flutter runtime on Windows crashes with an unrecoverable abort() when the GPU device is lost during fence synchronization. This happens during common user scenarios like sleep/wake, GPU TDR (Timeout Detection & Recovery), and driver resets. The crash originates from VERIFY_OK macros wrapping ID3D11Fence::SetEventOnCompletion() and ID3D11DeviceContext4::Signal() in
rive_native_windows.cpp
, which call abort() on any non-S_OK HRESULT — including DXGI_ERROR_DEVICE_REMOVED, a legitimate runtime condition.
Environment
Rive Flutter Runtime: rive_native (custom fork, D3D11 backend)
Platform: Windows 10/11, D3D11 with ID3D11Device5 fence path
Flutter: 3.38+
GPU: Reproduced on both NVIDIA and Intel adapters
Reproduction Steps
Run a Flutter app using the Rive renderer on Windows
Start rendering animations (one or more RiveWidget instances)
Put the machine to sleep (Win+X → Sleep) or trigger a GPU TDR (e.g., heavy GPU load causing driver reset)
Wake the machine
Result: App crashes immediately
Crash Log
........\platform\windows\rive_native_windows.cpp:220:
D3D error unknown error: m_lastFrameFence->SetEventOnCompletion(m_activeFenceWaitIndex, nullptr)
Lost connection to device.
Root Cause
The crash occurs in WindowsContextPLS::fenceWaitThread():
cpp
// rive_native_windows.cpp — fenceWaitThread()
VERIFY_OK(m_lastFrameFence->SetEventOnCompletion(m_activeFenceWaitIndex, nullptr));
And in WindowsContextPLS::end():
cpp
VERIFY_OK(m_gpuContext4->Signal(m_lastFrameFence.Get(), ++m_lastFrameIndex));
VERIFY_OK calls abort() on failure. When the GPU device is removed, these D3D calls return DXGI_ERROR_DEVICE_REMOVED (0x887A0005) — a non-fatal, expected runtime condition per the DXGI documentation. The Win32/D3D contract is that applications must handle this gracefully by detecting the error and either re-creating the device or degrading gracefully.
Why This Happens
Laptop sleep/wake Very common App crashes on wake, user loses session
GPU driver update/reset Occasional App crashes during driver install
GPU TDR (long shader/compute) Rare App crashes under heavy GPU load
Remote Desktop attach/detach Occasional App crashes when GPU context changes
Impact
Severity: Critical — unrecoverable crash, no user workaround
User experience: Users lose all unsaved work when their laptop sleeps. On production broadcast systems (our use case), this can cause live broadcast interruptions.
Affected users: All Windows users of the Rive Flutter runtime who use the D3D11 fence path (Windows 10 Creators Update 1703+, which is essentially all supported Windows machines)
Our Local Patch (Workaround)
We applied the following changes to our fork to prevent the crash:
- Replace VERIFY_OK with HRESULT checks
diff
// fenceWaitThread()
- VERIFY_OK(m_lastFrameFence->SetEventOnCompletion(m_activeFenceWaitIndex, nullptr));
+ HRESULT hr = m_lastFrameFence->SetEventOnCompletion(m_activeFenceWaitIndex, nullptr);
+ if (FAILED(hr)) {
+ m_deviceLost = true;
+ // Notify app, unblock main thread, exit fence thread
+ break;
+ }
diff
// end()
- VERIFY_OK(m_gpuContext4->Signal(m_lastFrameFence.Get(), ++m_lastFrameIndex));
+ HRESULT hrSignal = m_gpuContext4->Signal(m_lastFrameFence.Get(), ++m_lastFrameIndex);
+ if (FAILED(hrSignal)) {
+ m_deviceLost = true;
+ return;
+ }
2. Guard all render entry points
cpp
void begin(bool clear, uint32_t color) {
if (m_deviceLost) return;
// ...
}
void end(float devicePixelRatio) {
if (m_deviceLost) return;
// ...
}
3. Event-based notification to Dart
Used PostMessage(WM_APP + 0x52) from the fence thread → Win32 WndProc subclass → MethodChannel.InvokeMethod("onGpuDeviceLost") to notify the Dart layer, which shows a user-facing toast.