Skip to content

Conversation

@Phylliida
Copy link

@Phylliida Phylliida commented Nov 4, 2025

This adds extra functions

ggml_pad_circular
ggml_pad_ext_circular

That have equivalent signatures to the non-circular versions (I considered modifying the existing ones, but didn't want to break existing code). Instead of padding with zeros, they act "on a torus" and loop x and y around.

I implemented this for CUDA, CPU, and Vulkan, as those are the primary backends people use in KoboldCpp/Stable Diffusion Cpp to generate images. For other backends, it'll fall back to non-circular.

This can be used to make seamless textures, see leejet/stable-diffusion.cpp#914 for an example and the changes needed on the image generation side. For some models (Stable Diffusion) simply calling the circular functions is sufficient, for other models (Qwen Image) you need to modify Rope embeddings slightly as well (so they cleanly loop).

I ran CI tests and added tests for these, but happy to answer any questions/modify things as needed.

(Edit notes: a previous version of this pr had also circular for conv, but we've decided that only circular pad is needed)

int d1); // dilation dimension 1


GGML_API struct ggml_tensor * ggml_conv_2d_circular(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd personally prefer the wrapping to be an option on existing commands (either add an optional parameter to existing functions, or do something like ggml_mul_mat_set_prec to modify it after it's created. But the core maintainers should decide. I just don't want to end up with 2^N different convolution functions as these additional options keep getting added.

Copy link
Author

@Phylliida Phylliida Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think an optional parameter is an option (iiuc) since they are c api? (could hack with macros, but not ideal) But I'm open to some state modifying thing if that's what they want

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay now that we are only doing pad, this is less relevant

@Acly
Copy link
Collaborator

Acly commented Nov 4, 2025

I am wondering, is it possible to add only a variant of ggml_pad with circular padding, use that as separate operation before the convolutions, then do the convolution without padding? How much slower is that?

Adding circular padding natively to all convolutions on all/most backends is a lot of investment. I'm not sure how common it is, so it would be interesting to know the trade-off.

@Phylliida
Copy link
Author

Phylliida commented Nov 15, 2025

I am wondering, is it possible to add only a variant of ggml_pad with circular padding, use that as separate operation before the convolutions, then do the convolution without padding? How much slower is that?

Adding circular padding natively to all convolutions on all/most backends is a lot of investment. I'm not sure how common it is, so it would be interesting to know the trade-off.

Huh, yes that's a very good suggestion and seems to work well.

For Qwen Image, using Vulkan on a 3090, I get 1.28s/it using pad ahead of time, vs 1.27s/it using circular convs, which is within rounding error, very little performance penalty. I'll update the PR to only do circular padding since that's all we need.

@Phylliida Phylliida changed the title Add circular tiling support to conv2d and pad, for Vulkan, CUDA, and CPU (used for making seamless textures) Add circular tiling support to pad, for Vulkan, CUDA, and CPU (used for making seamless textures) Nov 15, 2025
const int64_t src_i2 = wrap_coord_circular(i2 - pads_l[2], ne_src[2]);
const int64_t src_i3 = wrap_coord_circular(i3 - pads_l[3], ne_src[3]);
exp_data[offset4d(ne_dst, i0, i1, i2, i3)] =
src_data[offset4d(ne_src.data(), src_i0, src_i1, src_i2, src_i3)];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrap_coord_circular and offset4d are missing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs testing Everything test related Vulkan Issues specific to the Vulkan backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants