SM120 / NVFP4: add device guard and runtime SM dispatch to cutlass_scaled_fp4_mm #29711

hholtmann · 2025-11-29T02:07:02Z

fix(cuda): add device guard and runtime SM dispatch to cutlass_scaled_fp4_mm

Purpose

Currently, the fp4 scaled_mm function doesn't work for the 5090 GPU, resulting in a RuntimeError: Internal Error. See #21274 and #22783 for more informatio

Test Plan

pytest -v -s tests/kernels/quantization/test_nvfp4_scaled_mm.py

Test Result

All passed.

chatgpt-codex-connector · 2025-11-29T02:07:10Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

gemini-code-assist

Code Review

This pull request introduces runtime SM dispatch for cutlass_scaled_fp4_mm, which is a solid improvement to support multiple GPU architectures and fixes issues on newer hardware like the 5090 series. The change from compile-time to runtime dispatch also corrects a latent critical bug where the previous implementation would attempt to return a value from a void function. The new logic is more robust and the improved error message is a good addition. I have one suggestion to refactor the dispatch logic to improve its long-term maintainability and make it less error-prone when adding support for future architectures.

csrc/quantization/fp4/nvfp4_scaled_mm_entry.cu

ApostaC · 2025-12-01T20:17:19Z

Hey @hholtmann , should this PR be merged to the main branch instead of releases/v0.11.2?

In the mean time, cc @mgoin @tlrmchlsmth

mergify · 2025-12-01T20:28:50Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @hholtmann.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

ProExpertProg

The diff seems polluted, can you rebase/merge from main?

Signed-off-by: mgoin <mgoin64@gmail.com>

mgoin · 2025-12-01T20:38:18Z

@ProExpertProg I'm working on it

bbrowning · 2025-12-01T23:58:12Z

I tested this on my DGX Spark (sm121), was able to reproduce the original issue, and confirm that this fixes the failing tests in test_nvfp4_scaled_mm.py. I normally just compile vLLM for my specific architecture, so had to first recompile locally to trigger the original issue with a multi-arch CUDA binary.

Reproduction Steps

# Enable builds for multiple architectures
export TORCH_CUDA_ARCH_LIST="10.0f;11.0f;12.0f;12.1a"

# Clear my cmake bits and set cmake up again, since I adjusted architectures
rm -rf cmake-build-release/

# Setup cmake for new architectures
cmake --preset release

# Rebuild vLLM
cmake --build --preset release --target install

pytest -v -s tests/kernels/quantization/test_nvfp4_scaled_mm.py

After taking the steps above, all of the tests in that file failed.

Verification Steps

# Apply the patch from this PR
cmake --build --preset release --target install

pytest -v -s tests/kernels/quantization/test_nvfp4_scaled_mm.py

After applying the fix from this PR and rebuilding, all tests passed.

mgoin · 2025-12-02T01:22:45Z

Excellent work @bbrowning, thank you for validating 🙏

mergify bot added the nvidia label Nov 29, 2025

github-project-automation bot added this to NVIDIA Nov 29, 2025

gemini-code-assist bot reviewed Nov 29, 2025

View reviewed changes

csrc/quantization/fp4/nvfp4_scaled_mm_entry.cu Show resolved Hide resolved

mgoin changed the base branch from releases/v0.11.2 to main December 1, 2025 20:28

mgoin requested review from LucasWilkinson, ProExpertProg, WoosukKwon, hmellor, houseroad, mgoin, pavanimajety, robertgshaw2-redhat, tlrmchlsmth, yewentao256 and youkaichao as code owners December 1, 2025 20:28

mergify bot added ci/build v1 labels Dec 1, 2025

mergify bot added the needs-rebase label Dec 1, 2025

mgoin added this to the v0.12.0 milestone Dec 1, 2025

ProExpertProg requested changes Dec 1, 2025

View reviewed changes

Rebase on main

0edf9ef

Signed-off-by: mgoin <mgoin64@gmail.com>

mgoin force-pushed the releases/v0.11.2 branch from 51bcb3b to 0edf9ef Compare December 1, 2025 20:38

mgoin requested a review from ProExpertProg December 1, 2025 20:39

mergify bot removed the needs-rebase label Dec 1, 2025

mgoin mentioned this pull request Dec 1, 2025

[Bug]: Qwen3-VL fails during multimodal encoder profiling (expected 3 dims, got 2) on Blackwell + NVFP4 (FlashInfer) even after CUDA header fix #29715

Open

1 task

mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 1, 2025

mgoin added the bug Something isn't working label Dec 1, 2025

mgoin approved these changes Dec 2, 2025

View reviewed changes

vllm-bot merged commit c0dfc89 into vllm-project:main Dec 2, 2025
90 of 94 checks passed

github-project-automation bot moved this to Done in NVIDIA Dec 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

SM120 / NVFP4: add device guard and runtime SM dispatch to cutlass_scaled_fp4_mm #29711

SM120 / NVFP4: add device guard and runtime SM dispatch to cutlass_scaled_fp4_mm #29711

hholtmann commented Nov 29, 2025 •

edited by github-actions bot

Loading

Uh oh!

chatgpt-codex-connector bot commented Nov 29, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

ApostaC commented Dec 1, 2025

Uh oh!

mergify bot commented Dec 1, 2025

Uh oh!

ProExpertProg left a comment

Uh oh!

mgoin commented Dec 1, 2025

Uh oh!

bbrowning commented Dec 1, 2025

Uh oh!

mgoin commented Dec 2, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Uh oh!

SM120 / NVFP4: add device guard and runtime SM dispatch to cutlass_scaled_fp4_mm #29711

SM120 / NVFP4: add device guard and runtime SM dispatch to cutlass_scaled_fp4_mm #29711

Conversation

hholtmann commented Nov 29, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chatgpt-codex-connector bot commented Nov 29, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

ApostaC commented Dec 1, 2025

Uh oh!

mergify bot commented Dec 1, 2025

Uh oh!

ProExpertProg left a comment

Choose a reason for hiding this comment

Uh oh!

mgoin commented Dec 1, 2025

Uh oh!

bbrowning commented Dec 1, 2025

Reproduction Steps

Verification Steps

Uh oh!

mgoin commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

hholtmann commented Nov 29, 2025 •

edited by github-actions bot

Loading

mgoin commented Dec 2, 2025 •

edited

Loading