vulkan: support larger argsort #17313

jeffbolznv · 2025-11-17T06:38:54Z

This is an extension of the original bitonic sorting shader that puts the temporary values in global memory and when more than 1024 threads are needed it runs multiple workgroups and synchronizes through a pipelinebarrier.

To improve the memory access pattern, a copy of the float value is kept with the index value. I've applied this same change to the original shared memory version of the shader, which is still used when ncols <= 1024.

Performance seems pretty good relative to the cuda backend, but somewhat worse than CUB for the largest sizes with multiple rows.

This is an extension of the original bitonic sorting shader that puts the temporary values in global memory and when more than 1024 threads are needed it runs multiple workgroups and synchronizes through a pipelinebarrier. To improve the memory access pattern, a copy of the float value is kept with the index value. I've applied this same change to the original shared memory version of the shader, which is still used when ncols <= 1024.

jeffbolznv requested review from 0cc4m and slaren as code owners November 17, 2025 06:38

github-actions bot added testing Everything test related Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Nov 17, 2025

DajanaV mentioned this pull request Nov 17, 2025

UPSTREAM PR #17313: vulkan: support larger argsort auroralabs-loci/llama.cpp#231

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

vulkan: support larger argsort #17313

vulkan: support larger argsort #17313

jeffbolznv commented Nov 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vulkan: support larger argsort #17313

Are you sure you want to change the base?

vulkan: support larger argsort #17313

Conversation

jeffbolznv commented Nov 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant