HIP: enable WMMA-MMQ INT kernels for RDNA 3 #17576

jiachengjason · 2025-11-28T15:26:11Z

Enabled WMMA-MMQ INT kernels for RDNA 3 architecture on AMD GPUs

Following similar approach to #17156

Using ./build/bin/llama-bench to collect the following performance results

Performance results with ggml/llama.cpp master commit up to/includes ab49f09

Build command for the following performance results:
HIPCXX="$(hipconfig -l)/clang" HIP_PATH="$(hipconfig -R)" cmake -S . -B build -DGGML_HIP=ON -DGGML_CUDA_FORCE_MMQ=OFF -DGGML_HIP_UMA=OFF -DGGML_HIP_ROCWMMA_FATTN=OFF -DGPU_TARGETS="gfx1100" -DGGML_HIP_GRAPHS=OFF -DLLAMA_CURL=OFF -DGGML_CUDA_FORCE_CUBLAS=OFF -DCMAKE_BUILD_TYPE=Release && cmake --build build --config Release -- -j 32

JohannesGaessler · 2025-12-01T22:05:14Z

ggml/src/ggml-cuda/mma.cuh

        static constexpr int ne = I * J / 32;
+#elif defined(RDNA3)
+        static constexpr int ne = (I == 16 && J == 16) ? I * J / 32 : I * J / 16;
+#endif


Suggested change

#endif

#endif // defined(RDNA4)

Please add comments to indicate which #if/#ifdef and #endif is closing.

JohannesGaessler · 2025-12-01T22:09:51Z

ggml/src/ggml-cuda/mmq.cu

+        if (GGML_CUDA_CC_IS_RDNA4(cc) || GGML_CUDA_CC_IS_RDNA3(cc)) {
            return true;
        }


Suggested change

if (GGML_CUDA_CC_IS_RDNA4(cc) || GGML_CUDA_CC_IS_RDNA3(cc)) {

return true;

}

return true;

JohannesGaessler · 2025-12-01T22:14:18Z

ggml/src/ggml-cuda/mmq.cuh

                A1.x[0] = 0x01010101;
                A1.x[1] = 0x01010101;
+                A1.x[2] = 0x01010101;
+                A1.x[3] = 0x01010101;


Suggested change

A1.x[0] = 0x01010101;

A1.x[1] = 0x01010101;

A1.x[2] = 0x01010101;

A1.x[3] = 0x01010101;

#pragma unroll

for (int l = 0; l < tile_A::ne; ++l) {

A1.x[l] = 0x01010101;

}

To my understanding tile_A has 4 elements for RDNA3 but for RDNA4 it only has 2 elements. So as it is this would result in out-of-bounds writes and potential memory trampling for RDNA4.

jiachengjason added 2 commits November 27, 2025 10:48

enabled wmma instructions for most quantizations other than q2k

cfa8f03

fixed the last q2_k test case failure

5fbd8f5

github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Nov 28, 2025

jiachengjason marked this pull request as ready for review December 1, 2025 21:36

jiachengjason requested a review from JohannesGaessler as a code owner December 1, 2025 21:36

jiachengjason mentioned this pull request Dec 1, 2025

HIP: Add RDNA3 WMMA support to MMF #17495

Open

JohannesGaessler reviewed Dec 1, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

HIP: enable WMMA-MMQ INT kernels for RDNA 3 #17576

HIP: enable WMMA-MMQ INT kernels for RDNA 3 #17576

jiachengjason commented Nov 28, 2025 •

edited

Loading

Uh oh!

JohannesGaessler Dec 1, 2025

Uh oh!

JohannesGaessler Dec 1, 2025

Uh oh!

JohannesGaessler Dec 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

HIP: enable WMMA-MMQ INT kernels for RDNA 3 #17576

Are you sure you want to change the base?

HIP: enable WMMA-MMQ INT kernels for RDNA 3 #17576

Conversation

jiachengjason commented Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JohannesGaessler Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

JohannesGaessler Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

JohannesGaessler Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jiachengjason commented Nov 28, 2025 •

edited

Loading