vulkan: Fix mismatch in TOPK_MOE unit test #17541
Open
+10
−9
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
We are currently seeing a mismatch in all TOPK_MOE unit tests on our upcoming platform. This is due to an implicit assumption in subgroup and workgroup mapping. For example the current topk_moe shader assumes that each subgroup lane is contiguous in the x dimension of a workgroup which is not exactly guaranteed .
There is no direct relationship between SubgroupLocalInvocationId and LocalInvocationId or LocalInvocationIndexOn the failing platform we were observing a single subgroup getting mapped in a 2D workgroup ID. For subgroupID 0 we get gl_LocalInvocationID.x: 0-7 and gl_LocalInvocationID.y: 0-3, whereas on other platforms it was gl_LocalInvocationID.x: 0-31 and gl_LocalInvocationID.y: 0.
We've tested 4404be4 on Intel/NVIDIA/AMD GPU for Windows, Intel/NVIDIA GPU for Linux and confirmed
test-backend-ops.exe -o TOPK_MOEpasses