-
Notifications
You must be signed in to change notification settings - Fork 13.7k
Fix too relaxed check on CUDA "fast copy" (can_be_transposed) condition #17332
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Add testcase or it didn't happen. :) |
Look, children, that's how an evil maintainer looks like :P Will never let you off the hook with any PR, ever! |
update_cuda_graph_executable: CUDA graph update failed
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to too many consecutive updates
update_cuda_graph_executable: CUDA graph update failed
CONT(type=f32,ne=[10,10,10,1]): OK
CONT(type=f32,ne=[2,1,1,1]): OK
CONT(type=f32,ne=[2,1,3,5]): OK
CONT(type=f32,ne=[2,3,5,7]): OK
CONT(type=f16,ne=[2,1,1,1]): OK
CONT(type=f16,ne=[2,1,3,5]): OK
CONT(type=f16,ne=[2,3,5,7]): OK
CONT(type=bf16,ne=[2,1,1,1]): OK
CONT(type=bf16,ne=[2,1,3,5]): OK
CONT(type=bf16,ne=[2,3,5,7]): OK
[CONT] NMSE = 0.447623183 > 0.000000100 CONT(type=f32,ne=[1,4,2,1]): FAIL
[CONT] NMSE = 2.241813873 > 0.000000100 CONT(type=f32,ne=[1,8,17,1]): FAIL
[CONT] NMSE = 0.058848433 > 0.000000100 CONT(type=bf16,ne=[1,4,2,1]): FAIL
[CONT] NMSE = 1.181509486 > 0.000000100 CONT(type=bf16,ne=[1,8,17,1]): FAIL
10/14 tests passedvs update_cuda_graph_executable: CUDA graph update failed
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to too many consecutive updates
update_cuda_graph_executable: CUDA graph update failed
CONT(type=f32,ne=[10,10,10,1]): OK
CONT(type=f32,ne=[2,1,1,1]): OK
CONT(type=f32,ne=[2,1,3,5]): OK
CONT(type=f32,ne=[2,3,5,7]): OK
CONT(type=f16,ne=[2,1,1,1]): OK
CONT(type=f16,ne=[2,1,3,5]): OK
CONT(type=f16,ne=[2,3,5,7]): OK
CONT(type=bf16,ne=[2,1,1,1]): OK
CONT(type=bf16,ne=[2,1,3,5]): OK
CONT(type=bf16,ne=[2,3,5,7]): OK
CONT(type=f32,ne=[1,4,2,1]): OK
CONT(type=f32,ne=[1,8,17,1]): OK
CONT(type=bf16,ne=[1,4,2,1]): OK
CONT(type=bf16,ne=[1,8,17,1]): OK
14/14 tests passed
Backend CUDA0: OK@CISC There :) |
JohannesGaessler
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the fix and sorry for not catching the bug during review; some functionality that would have covered this was removed and the logic was not adjusted.
Preferably add an argument for the existing tests for GGML_OP_CONT rather than adding a new test case.
See #16095 (comment) for case.