You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
.../LTX-Video-Q8-Kernels\csrc\gemm\mma_sm89_fp16.hpp:80: block: [12,5,0], thread: [255,0,0] Assertion 0 && "Attempting to use SM89_16x8x32_F32E4M3E4M3F32_TN without CUTE_ARCH_MMA_F16_SM89_ENABLED" failed.
Running a 4090. ChatGPT said:
„That error message means that the kernel code you're using tries to use a specific CUDA tensor core instruction (SM89 for FP8/FP16) — but the corresponding architecture macro (CUTE_ARCH_MMA_F16_SM89_ENABLED) is not defined, so it fails intentionally.
You're likely running on an Ada Lovelace GPU (e.g. RTX 40-series) with compute capability 8.9 (SM89), and the CUTLASS-based kernel you compiled expects this feature to be explicitly enabled.“
ChatGPT suggested adding code corrections inside of the setup.py (LTX-Video-Q8-Kernels folder):
"-DCUTE_ARCH_MMA_SM89_ENABLED",
"-DCUTE_ARCH_MMA_F16_SM89_ENABLED",
"-DCUTE_USE_DEVICE_ATOMICS=1",
... that didnt work neither so I got stuck with the following statement from Chatty:
Your issue is that:
- The GEMM-related CUDA code (in mma_sm89_fp16.hpp) uses inline PTX assembly or syntax not supported under Windows NVCC/MSVC.
- That means the Windows toolchain can’t compile this — even though the hardware (RTX 4090) fully supports it.
What You Can Do
Use a Linux or WSL2 Environment
The text was updated successfully, but these errors were encountered: