-
Notifications
You must be signed in to change notification settings - Fork 12.2k
SYCL: disable faulty fp16 exp kernel #14395
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
ggml/src/ggml-sycl/ggml-sycl.cpp
Outdated
@@ -4201,6 +4201,8 @@ static bool ggml_backend_sycl_device_supports_op(ggml_backend_dev_t dev, const g | |||
case GGML_UNARY_OP_GELU_ERF: | |||
case GGML_UNARY_OP_TANH: | |||
case GGML_UNARY_OP_EXP: | |||
// Disable FP16 until we find out the root cause of failing fp16 sycl::exp | |||
return ggml_is_contiguous(op->src[0]) && (op->type == op->src[0]->type) && op->src[0]->type == GGML_TYPE_F32; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm having a quick look at this issue but I think the issue comes from the CPU backend. I don't think we should merge this change as part of this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it confirmed? I will revert this change ASAP
I wish there was a way to print the tensor:
[EXP] inf mismatch: SYCL0=inf CPU=1.442383 FAIL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rbiessy continued the conversation in #14317 (comment). Let's see what the maintainers say
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd still prefer to disable fp16 exp in a separate PR but yeah it seems ok to me. I'm hoping we can re-enable it once we switch to oneAPI 2025.2.
On a side note I'll be away tomorrow and early July so don't expect answers :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Please take your time. :)
@qnixsynapse I guess this PR just got simplified. :) |
8b5ea7a
to
ed0aab1
Compare
This reverts commit ed0aab1.
@qnixsynapse Bad merge? |
@CISC Yeah, I mistakenly added local changes. Sorry about that. Can I get an approval so that the SYCL CI can be fixed for the time being please? |
Took the improvements from the GLU branch to reduce the number of changes there and disabled for the time being the exp fp16 kernel which suddenly started failing (even locally).
I suspect that it is because of the latest Intel-oneapi-base-toolkit update.