HIP: Cleanup hipification header #15285

IMbackK · 2025-08-13T07:27:38Z

Switch over to hip_bf16 from legacy hip_bfloat16
Simplify RDNA3 define
Reduce swap over of new hipblas api to rocm 6.5 as this version is used for rocm 7.0 previews

IMbackK · 2025-08-13T08:56:59Z

looks like this dosent compile against older rocm, dont commit.

IMbackK · 2025-08-14T09:52:03Z

@JohannesGaessler im not a huge fan of the changes in 97392a5 but i see no better option, the other options we have are:

restrict the supported rocm versions to 6.3+
1. this is just one version behind latest
2. we loose debian stable. probably others, like that
keep using hipbfloat16.h for now including the bfloat162 hack

IMbackK · 2025-08-14T09:55:48Z

to everyone: if you do merge this, please do not squash 97392a5 into 3f662cd, as i intend to revert 97392a5 once we are in the position to drop support for rocm <6.3

…f rocm

Switch over to hip_bf16 from legacy hip_bfloat16 Simplify RDNA3 define Reduce swap over of new hipblas api to rocm 6.5 as this version is used for rocm 7.0 previews

JohannesGaessler · 2025-08-14T11:34:50Z

I agree that having some extra function for type conversions is very annoying. Unfortunately with CUDA this is already necessary on master due to FP16 <-> BF16 conversions being ambiguous. I pushed a version that consolidates the other code that needs to handle these cases so that, going forward, we need to maintain only a single version. I also simplified the code a bit: I changed the name to ggml_cuda_cast to make the lines using it a bit shorter and I swapped the order of template arguments in order to make it possible to specify only the destination type with the source type being inferred from the argument.

One more question: is there a reason why you declared the function as __host__ __device__? I don't think it's ever used in host code. (There is I think technically also a difference for how code is being compiled due to the use of inline instead of __forceinline__ but the compiler really should be inlining these functions.)

IMbackK · 2025-08-14T13:19:50Z

I had the template parameter order that way around because this is what all other functions in ggml do, but this way is fine with me.

The rest of the changes are fine with me.

While nothing currently uses this on the host side, i see no reason to restrict it to device code, indeed if its just __device__ some one might look at this and determine that its only necessary to use the casting function on the device side, which is not true.

the inline is obviously just there because its header implemented, i have no objections to makeing it forceinline, but if the compiler is not inlineing these functions automatically i dont know what to tell you.

JohannesGaessler · 2025-08-14T13:28:49Z

I think __forceinline__ is incompatible with __host__, that's why I was asking. Using it would be more consistent with the rest of the code but I don't think it's going to matter much.

IMbackK · 2025-08-14T13:29:53Z

__forceinline__ is fine to apply to host functions in hip, not sure about cuda

github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Aug 13, 2025

IMbackK requested a review from JohannesGaessler August 13, 2025 07:39

JohannesGaessler approved these changes Aug 13, 2025

View reviewed changes

IMbackK marked this pull request as draft August 13, 2025 09:03

IMbackK force-pushed the hipcleanup branch from e2cb7ea to 3f662cd Compare August 14, 2025 09:46

IMbackK mentioned this pull request Aug 14, 2025

HIP: hipBLAS changes ROCm 6.5 #15301

Closed

IMbackK marked this pull request as ready for review August 14, 2025 09:48

IMbackK force-pushed the hipcleanup branch from 3f662cd to 5c57184 Compare August 14, 2025 10:06

IMbackK added 2 commits August 14, 2025 12:09

CUDA/HIP: add expicit conversion operator to support older versions o…

009626c

…f rocm

HIP: Cleanup hipification header

e9ff641

Switch over to hip_bf16 from legacy hip_bfloat16 Simplify RDNA3 define Reduce swap over of new hipblas api to rocm 6.5 as this version is used for rocm 7.0 previews

IMbackK force-pushed the hipcleanup branch from 5c57184 to e9ff641 Compare August 14, 2025 10:09

IMbackK requested a review from JohannesGaessler August 14, 2025 10:14

JohannesGaessler added 2 commits August 14, 2025 12:37

convert_val -> cast

173df5e

swap arg order, consolidate

6dcde5a

IMbackK merged commit 5ba36f6 into ggml-org:master Aug 14, 2025
46 of 47 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

HIP: Cleanup hipification header #15285

HIP: Cleanup hipification header #15285

Uh oh!

IMbackK commented Aug 13, 2025

Uh oh!

IMbackK commented Aug 13, 2025

Uh oh!

IMbackK commented Aug 14, 2025 •

edited

Loading

Uh oh!

IMbackK commented Aug 14, 2025

Uh oh!

JohannesGaessler commented Aug 14, 2025

Uh oh!

IMbackK commented Aug 14, 2025

Uh oh!

JohannesGaessler commented Aug 14, 2025

Uh oh!

IMbackK commented Aug 14, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

HIP: Cleanup hipification header #15285

HIP: Cleanup hipification header #15285

Uh oh!

Conversation

IMbackK commented Aug 13, 2025

Uh oh!

IMbackK commented Aug 13, 2025

Uh oh!

IMbackK commented Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

IMbackK commented Aug 14, 2025

Uh oh!

JohannesGaessler commented Aug 14, 2025

Uh oh!

IMbackK commented Aug 14, 2025

Uh oh!

JohannesGaessler commented Aug 14, 2025

Uh oh!

IMbackK commented Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

IMbackK commented Aug 14, 2025 •

edited

Loading

IMbackK commented Aug 14, 2025 •

edited

Loading