Fix gpu_graph fallback on old Nvidia GPU. by duburcqa · Pull Request #443 · Genesis-Embodied-AI/quadrants

duburcqa · 2026-03-31T22:59:22Z

This PR checks SM compatibility if native GPU graph is not supported by a given CUDA kernel at runtime, and fallback to host-side do-while loop if not supported, instead of systematically raising an exception.

Moreover, message about fallbacking to host-side do-while loop if GPU Graph is not supported has been demoted from warning to information. Warning is supposed to require user action to be fixed. Here, there is nothing to fix, it is part of the expected workflow. Still, it is nice to inform the user that it is not doing what one may expect intuitively.

Resolves Genesis-Embodied-AI/Genesis#2631

quadrants/runtime/cuda/gpu_graph_manager.cpp

hughperkins · 2026-03-31T23:02:56Z

quadrants/runtime/cuda/gpu_graph_manager.cpp

-    QD_ERROR_IF(!cond_kernel_func_,
-                "Condition kernel not available; cannot build graph_do_while");
+    if (!cond_kernel_func_) {
+      // Device does not support graph_do_while (requires SM 9.0+).


What I want to happen is:

if we are on SM90+, and the conditional kernel is not available => throw an exception; ths user must install the pre-requiaites so they get the full performance out of their SM90. We don't want people using an SM90, assuming they are getting full performance, and then thinking Genesis is slow, I feel

if < SM90, then fall back to host side do while loop.

Quetions:

To what extent does this change align with these two constraints?

To what extent are we testing both conditions? (or at least, testing the first constraint?)

I see. Updated accordingly.

To what extent are we testing both conditions? (or at least, testing the first constraint?)

Just skimmed through the unit tests and it seems we are. I will double check tomorrow.

if we are on SM90+, and the conditional kernel is not available => throw an exception;

I have zero idea about how to test this. You are asking me to test something that requires a faulty device.

if < SM90, then fall back to host side do while loop.

This is already unit tested.