Fix too relaxed check on CUDA "fast copy" (can_be_transposed) condition #17332

pwilkin · 2025-11-17T21:59:02Z

CISC · 2025-11-17T22:06:45Z

Add testcase or it didn't happen. :)

pwilkin · 2025-11-17T22:27:41Z

Add testcase or it didn't happen. :)

Look, children, that's how an evil maintainer looks like :P Will never let you off the hook with any PR, ever!

pwilkin · 2025-11-17T22:29:30Z

update_cuda_graph_executable: CUDA graph update failed
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to too many consecutive updates
update_cuda_graph_executable: CUDA graph update failed
  CONT(type=f32,ne=[10,10,10,1]): OK
  CONT(type=f32,ne=[2,1,1,1]): OK
  CONT(type=f32,ne=[2,1,3,5]): OK
  CONT(type=f32,ne=[2,3,5,7]): OK
  CONT(type=f16,ne=[2,1,1,1]): OK
  CONT(type=f16,ne=[2,1,3,5]): OK
  CONT(type=f16,ne=[2,3,5,7]): OK
  CONT(type=bf16,ne=[2,1,1,1]): OK
  CONT(type=bf16,ne=[2,1,3,5]): OK
  CONT(type=bf16,ne=[2,3,5,7]): OK
[CONT] NMSE = 0.447623183 > 0.000000100   CONT(type=f32,ne=[1,4,2,1]): FAIL
[CONT] NMSE = 2.241813873 > 0.000000100   CONT(type=f32,ne=[1,8,17,1]): FAIL
[CONT] NMSE = 0.058848433 > 0.000000100   CONT(type=bf16,ne=[1,4,2,1]): FAIL
[CONT] NMSE = 1.181509486 > 0.000000100   CONT(type=bf16,ne=[1,8,17,1]): FAIL
  10/14 tests passed

vs

  update_cuda_graph_executable: CUDA graph update failed
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to too many consecutive updates
update_cuda_graph_executable: CUDA graph update failed
  CONT(type=f32,ne=[10,10,10,1]): OK
  CONT(type=f32,ne=[2,1,1,1]): OK
  CONT(type=f32,ne=[2,1,3,5]): OK
  CONT(type=f32,ne=[2,3,5,7]): OK
  CONT(type=f16,ne=[2,1,1,1]): OK
  CONT(type=f16,ne=[2,1,3,5]): OK
  CONT(type=f16,ne=[2,3,5,7]): OK
  CONT(type=bf16,ne=[2,1,1,1]): OK
  CONT(type=bf16,ne=[2,1,3,5]): OK
  CONT(type=bf16,ne=[2,3,5,7]): OK
  CONT(type=f32,ne=[1,4,2,1]): OK
  CONT(type=f32,ne=[1,8,17,1]): OK
  CONT(type=bf16,ne=[1,4,2,1]): OK
  CONT(type=bf16,ne=[1,8,17,1]): OK
  14/14 tests passed
  Backend CUDA0: OK

@CISC There :)

JohannesGaessler

Thank you for the fix and sorry for not catching the bug during review; some functionality that would have covered this was removed and the logic was not adjusted.

Preferably add an argument for the existing tests for GGML_OP_CONT rather than adding a new test case.

pwilkin added 2 commits November 17, 2025 22:57

Fix too relaxed check on CUDA "fast copy" (can_be_transposed) condition

c2bfbbf

Argh.

5e7c26f

pwilkin mentioned this pull request Nov 17, 2025

Model: Qwen3 Next #16095

Open

pwilkin requested review from JohannesGaessler and am17an November 17, 2025 22:00

Making CISC happy ;)

d51f719

pwilkin requested a review from slaren as a code owner November 17, 2025 22:29

DajanaV mentioned this pull request Nov 17, 2025

UPSTREAM PR #17332: Fix too relaxed check on CUDA "fast copy" (can_be_transposed) condition auroralabs-loci/llama.cpp#244

Open

github-actions bot added testing Everything test related Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Nov 18, 2025

JohannesGaessler approved these changes Nov 18, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix too relaxed check on CUDA "fast copy" (can_be_transposed) condition #17332

Fix too relaxed check on CUDA "fast copy" (can_be_transposed) condition #17332

pwilkin commented Nov 17, 2025

Uh oh!

CISC commented Nov 17, 2025

Uh oh!

pwilkin commented Nov 17, 2025

Uh oh!

pwilkin commented Nov 17, 2025

Uh oh!

JohannesGaessler left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix too relaxed check on CUDA "fast copy" (can_be_transposed) condition #17332

Are you sure you want to change the base?

Fix too relaxed check on CUDA "fast copy" (can_be_transposed) condition #17332

Conversation

pwilkin commented Nov 17, 2025

Uh oh!

CISC commented Nov 17, 2025

Uh oh!

pwilkin commented Nov 17, 2025

Uh oh!

pwilkin commented Nov 17, 2025

Uh oh!

JohannesGaessler left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants