Skip to content

D3D11 VERIFY_OK aborts on GPU device-lost (DXGI_ERROR_DEVICE_REMOVED) — Flutter Windows #86

@unspokenlanguage

Description

@unspokenlanguage

Summary
The Rive Flutter runtime on Windows crashes with an unrecoverable abort() when the GPU device is lost during fence synchronization. This happens during common user scenarios like sleep/wake, GPU TDR (Timeout Detection & Recovery), and driver resets. The crash originates from VERIFY_OK macros wrapping ID3D11Fence::SetEventOnCompletion() and ID3D11DeviceContext4::Signal() in

rive_native_windows.cpp
, which call abort() on any non-S_OK HRESULT — including DXGI_ERROR_DEVICE_REMOVED, a legitimate runtime condition.

Environment
Rive Flutter Runtime: rive_native (custom fork, D3D11 backend)
Platform: Windows 10/11, D3D11 with ID3D11Device5 fence path
Flutter: 3.38+
GPU: Reproduced on both NVIDIA and Intel adapters

Reproduction Steps
Run a Flutter app using the Rive renderer on Windows
Start rendering animations (one or more RiveWidget instances)
Put the machine to sleep (Win+X → Sleep) or trigger a GPU TDR (e.g., heavy GPU load causing driver reset)

Wake the machine
Result: App crashes immediately
Crash Log
........\platform\windows\rive_native_windows.cpp:220:
D3D error unknown error: m_lastFrameFence->SetEventOnCompletion(m_activeFenceWaitIndex, nullptr)
Lost connection to device.

Root Cause
The crash occurs in WindowsContextPLS::fenceWaitThread():

cpp
// rive_native_windows.cpp — fenceWaitThread()
VERIFY_OK(m_lastFrameFence->SetEventOnCompletion(m_activeFenceWaitIndex, nullptr));
And in WindowsContextPLS::end():
cpp
VERIFY_OK(m_gpuContext4->Signal(m_lastFrameFence.Get(), ++m_lastFrameIndex));
VERIFY_OK calls abort() on failure. When the GPU device is removed, these D3D calls return DXGI_ERROR_DEVICE_REMOVED (0x887A0005) — a non-fatal, expected runtime condition per the DXGI documentation. The Win32/D3D contract is that applications must handle this gracefully by detecting the error and either re-creating the device or degrading gracefully.

Why This Happens

Laptop sleep/wake Very common App crashes on wake, user loses session
GPU driver update/reset Occasional App crashes during driver install
GPU TDR (long shader/compute) Rare App crashes under heavy GPU load

Remote Desktop attach/detach Occasional App crashes when GPU context changes

Impact
Severity: Critical — unrecoverable crash, no user workaround
User experience: Users lose all unsaved work when their laptop sleeps. On production broadcast systems (our use case), this can cause live broadcast interruptions.
Affected users: All Windows users of the Rive Flutter runtime who use the D3D11 fence path (Windows 10 Creators Update 1703+, which is essentially all supported Windows machines)

Our Local Patch (Workaround)
We applied the following changes to our fork to prevent the crash:

  1. Replace VERIFY_OK with HRESULT checks
diff
// fenceWaitThread()
- VERIFY_OK(m_lastFrameFence->SetEventOnCompletion(m_activeFenceWaitIndex, nullptr));
+ HRESULT hr = m_lastFrameFence->SetEventOnCompletion(m_activeFenceWaitIndex, nullptr);
+ if (FAILED(hr)) {
+     m_deviceLost = true;
+     // Notify app, unblock main thread, exit fence thread
+     break;
+ }
diff
// end()
- VERIFY_OK(m_gpuContext4->Signal(m_lastFrameFence.Get(), ++m_lastFrameIndex));
+ HRESULT hrSignal = m_gpuContext4->Signal(m_lastFrameFence.Get(), ++m_lastFrameIndex);
+ if (FAILED(hrSignal)) {
+     m_deviceLost = true;
+     return;
+ }


2. Guard all render entry points

cpp
void begin(bool clear, uint32_t color) {
    if (m_deviceLost) return;
    // ...
}
void end(float devicePixelRatio) {
    if (m_deviceLost) return;
    // ...
}

3. Event-based notification to Dart
Used PostMessage(WM_APP + 0x52) from the fence thread → Win32 WndProc subclass → MethodChannel.InvokeMethod("onGpuDeviceLost") to notify the Dart layer, which shows a user-facing toast.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions