Skip to content

Fix gpu_graph fallback on old Nvidia GPU.#443

Open
duburcqa wants to merge 2 commits intomainfrom
duburcqa/fix_gpu_graph_fallback_old_nvidia
Open

Fix gpu_graph fallback on old Nvidia GPU.#443
duburcqa wants to merge 2 commits intomainfrom
duburcqa/fix_gpu_graph_fallback_old_nvidia

Conversation

@duburcqa
Copy link
Copy Markdown
Contributor

@duburcqa duburcqa commented Mar 31, 2026

This PR checks SM compatibility if native GPU graph is not supported by a given CUDA kernel at runtime, and fallback to host-side do-while loop if not supported, instead of systematically raising an exception.

Moreover, message about fallbacking to host-side do-while loop if GPU Graph is not supported has been demoted from warning to information. Warning is supposed to require user action to be fixed. Here, there is nothing to fix, it is part of the expected workflow. Still, it is nice to inform the user that it is not doing what one may expect intuitively.

Resolves Genesis-Embodied-AI/Genesis#2631

QD_ERROR_IF(!cond_kernel_func_,
"Condition kernel not available; cannot build graph_do_while");
if (!cond_kernel_func_) {
// Device does not support graph_do_while (requires SM 9.0+).
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I want to happen is:

  • if we are on SM90+, and the conditional kernel is not available => throw an exception; ths user must install the pre-requiaites so they get the full performance out of their SM90. We don't want people using an SM90, assuming they are getting full performance, and then thinking Genesis is slow, I feel
  • if < SM90, then fall back to host side do while loop.

Quetions:

  • To what extent does this change align with these two constraints?
  • To what extent are we testing both conditions? (or at least, testing the first constraint?)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Updated accordingly.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To what extent are we testing both conditions? (or at least, testing the first constraint?)

Just skimmed through the unit tests and it seems we are. I will double check tomorrow.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we are on SM90+, and the conditional kernel is not available => throw an exception;

I have zero idea about how to test this. You are asking me to test something that requires a faulty device.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if < SM90, then fall back to host side do while loop.

This is already unit tested.



@test_utils.test()
@test_utils.test(arch=qd.gpu)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should work on ALL bakcends, including cpu. Please remove the filter on gpu. Why do you feel this test should not run on cpu?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wanted to be conservative. I can remove this guard.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like, we only need gpu in Genesis.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done



@test_utils.test()
@test_utils.test(arch=qd.gpu)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, gpu graph is supported on cpu. Perhaps I should rename it 🤔 (I had a similar challenge with Opus, when it was named 'cuda_graph': it refused to run it on anthing other than cuda. So I renamed it. perhaps I should just call it graph?)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I renamed it. perhaps I should just call it graph?)

I think you should yes.



@test_utils.test()
@test_utils.test(arch=qd.gpu)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cpu too. no filter

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done



@test_utils.test()
@test_utils.test(arch=qd.gpu)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove filter

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done



@test_utils.test()
@test_utils.test(arch=qd.gpu)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove filter

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done



@test_utils.test()
@test_utils.test(arch=qd.gpu)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove filter

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done



@test_utils.test()
@test_utils.test(arch=qd.gpu)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove filter

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done



@test_utils.test()
@test_utils.test(arch=qd.gpu)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove filter

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@duburcqa duburcqa force-pushed the duburcqa/fix_gpu_graph_fallback_old_nvidia branch from b9c7936 to a879710 Compare April 1, 2026 07:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Condition kernel not available; cannot build graph_do_while on RTX 3060 (SM 8.6)

2 participants