Skip to content

Conversation

ggerganov
Copy link
Member

@ggerganov ggerganov commented Sep 28, 2025

target #16148

Gauging what would it take to remove the KQ mask padding along the batch dimension (ne31). Removing this padding would simplify the graph building logic and will reduce the amount of memory that we allocate and transfer for KQ masks.

@github-actions github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Sep 28, 2025
@jeffbolznv
Copy link
Collaborator

This will require some more changes to the Vulkan backend.

@jeffbolznv
Copy link
Collaborator

#16316 makes Vulkan handle this.

@slaren
Copy link
Member

slaren commented Sep 28, 2025

Wouldn't this cause the tensor shape to change in every evaluation, and break graph reuse and CUDA graphs?

@ggerganov
Copy link
Member Author

Wouldn't this cause the tensor shape to change in every evaluation, and break graph reuse and CUDA graphs?

It shouldn't - this is the padding along the batch dimension (src[3]->ne[1]). The padding along the context dimension (src[3]->ne[0]) is relevant for having constant graph shapes. It will remain.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants