Skip to content

SYCL: disable faulty fp16 exp kernel #14395

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

SYCL: disable faulty fp16 exp kernel #14395

wants to merge 4 commits into from

Conversation

qnixsynapse
Copy link
Collaborator

@qnixsynapse qnixsynapse commented Jun 26, 2025

Took the improvements from the GLU branch to reduce the number of changes there and disabled for the time being the exp fp16 kernel which suddenly started failing (even locally).
I suspect that it is because of the latest Intel-oneapi-base-toolkit update.

@github-actions github-actions bot added ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels Jun 26, 2025
@qnixsynapse qnixsynapse requested review from Rbiessy and Alcpz June 26, 2025 14:24
@@ -4201,6 +4201,8 @@ static bool ggml_backend_sycl_device_supports_op(ggml_backend_dev_t dev, const g
case GGML_UNARY_OP_GELU_ERF:
case GGML_UNARY_OP_TANH:
case GGML_UNARY_OP_EXP:
// Disable FP16 until we find out the root cause of failing fp16 sycl::exp
return ggml_is_contiguous(op->src[0]) && (op->type == op->src[0]->type) && op->src[0]->type == GGML_TYPE_F32;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having a quick look at this issue but I think the issue comes from the CPU backend. I don't think we should merge this change as part of this PR

Copy link
Collaborator Author

@qnixsynapse qnixsynapse Jun 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it confirmed? I will revert this change ASAP

I wish there was a way to print the tensor:

[EXP] inf mismatch: SYCL0=inf CPU=1.442383 FAIL

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rbiessy continued the conversation in #14317 (comment). Let's see what the maintainers say

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd still prefer to disable fp16 exp in a separate PR but yeah it seems ok to me. I'm hoping we can re-enable it once we switch to oneAPI 2025.2.

On a side note I'll be away tomorrow and early July so don't expect answers :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Please take your time. :)

@CISC
Copy link
Collaborator

CISC commented Jun 29, 2025

@qnixsynapse I guess this PR just got simplified. :)

@CISC
Copy link
Collaborator

CISC commented Jun 29, 2025

@qnixsynapse Bad merge?

@qnixsynapse
Copy link
Collaborator Author

@CISC Yeah, I mistakenly added local changes. Sorry about that. Can I get an approval so that the SYCL CI can be fixed for the time being please?

@qnixsynapse qnixsynapse changed the title SYCL: Take improvements from GLU branch and disable faulty fp16 exp kernel SYCL: disable faulty fp16 exp kernel Jun 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants