Skip to content

I encountered this problem when running the fp8 model, using a modified 4080, please help #182

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
YangShenglong-ai opened this issue May 7, 2025 · 1 comment

Comments

@YangShenglong-ai
Copy link

Image

@KannManMachen
Copy link

Same Problem here:

.../LTX-Video-Q8-Kernels\csrc\gemm\mma_sm89_fp16.hpp:80: block: [12,5,0], thread: [255,0,0] Assertion 0 && "Attempting to use SM89_16x8x32_F32E4M3E4M3F32_TN without CUTE_ARCH_MMA_F16_SM89_ENABLED" failed.

Running a 4090. ChatGPT said:

„That error message means that the kernel code you're using tries to use a specific CUDA tensor core instruction (SM89 for FP8/FP16) — but the corresponding architecture macro (CUTE_ARCH_MMA_F16_SM89_ENABLED) is not defined, so it fails intentionally.

You're likely running on an Ada Lovelace GPU (e.g. RTX 40-series) with compute capability 8.9 (SM89), and the CUTLASS-based kernel you compiled expects this feature to be explicitly enabled.“

ChatGPT suggested adding code corrections inside of the setup.py (LTX-Video-Q8-Kernels folder):
"-DCUTE_ARCH_MMA_SM89_ENABLED",
"-DCUTE_ARCH_MMA_F16_SM89_ENABLED",
"-DCUTE_USE_DEVICE_ATOMICS=1",

... that didnt work neither so I got stuck with the following statement from Chatty:

Your issue is that:

- The GEMM-related CUDA code (in mma_sm89_fp16.hpp) uses inline PTX assembly or syntax not supported under Windows NVCC/MSVC.

- That means the Windows toolchain can’t compile this — even though the hardware (RTX 4090) fully supports it.

What You Can Do
Use a Linux or WSL2 Environment

Can anyone make sense of this statement?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants