Skip to content

Conversation

Firestar99
Copy link
Member

@Firestar99 Firestar99 commented Sep 8, 2025

Old embark PR trying to do the same, though I made this PR is made from scratch due to the branches being too different: https://github.com/EmbarkStudios/rust-gpu/pull/1158/files

Why?

Currently, rust-gpu has special layout rules for glam types, which gives them a higher alignment than necessary. This causes structs using them to have a different layout (size, alignment, member offsets) in rust-gpu than they do on the CPU, causing a lot of UB when sending data between host and device. Worst of them is Vec3, which has a size of 12 on the CPU but 16 in rust-gpu, which is why we always recommended not to use them in structs shared between targets.

Another common workaround was to use glam's feature cuda, which would give all glam vectors a higher than necessary alignment. These alignments seem match those of rust-gpu, though they are defined in very different places and could drift out of sync.

This is a mess, and this PR tries to clean it up by:

  • have SpirvType::Vector size and alignment to match layout of the type
  • remove all spirv-specific layout hacks in glam
  • replacing #[repr(SIMD)] with our own #[spirv(vector)]
  • glam/cuda feature will continue to work, though must be turned on on both targets!
  • we (finally) get support for glam::BVecN (bool vecs)

Size and Alignment chart

(rust-gpu chooses chaotic evil)

type offset in
difftest
CPU current removing
SIMD hacks
size align size align size align
UVec2 0x400 8 4 8 *8* 8 4
IVec2 0x500 8 4 8 *8* 8 4
Vec2 0x600 8 4 8 *8* 8 4
UVec3 0x800 12 4 *16* *16* 12 4
IVec3 0x900 12 4 *16* *16* 12 4
Vec3 0xa00 12 4 *16* *16* 12 4
Vec3A 0xb00 16 16 16 16 *12* *4*
UVec4 0xc00 16 4 16 *16* 16 4
IVec4 0xd00 16 4 16 *16* 16 4
Vec4 0xe00 16 16 16 16 16 *4*

Notes:

  • removing SIMD hacks: misalignment caused by glam hiding alignment from spirv, see comment below
  • current implementation matches CPU exactly, but requires modifications in glam to remove these feature gates hiding alignment
  • if we have to update glam anyway, may as well remove our and glam's #[repr(SIMD)] hacks and replace them with a custom #[spirv(vector)] that we control (and rustc can't change on a whim)

@Firestar99 Firestar99 changed the base branch from main to difftest_refactor September 8, 2025 15:20
@Firestar99 Firestar99 force-pushed the vec3-12-bytes branch 3 times, most recently from d8b1b46 to 11c141a Compare September 11, 2025 17:25
@LegNeato
Copy link
Collaborator

Hard for me to grasp the why. What are the pros and cons here? Will this break projects like https://github.com/LegNeato/rust-gpu-chimera or will it align spirv with what glam w/ cuda feature and/or rust-cuda does?

#[repr(transparent)]
#[derive(Copy, Clone, Default, Eq, PartialEq)]
#[cfg_attr(feature = "bytemuck", derive(bytemuck::Zeroable, bytemuck::Pod))]
pub struct SubgroupMask(pub glam::UVec4);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of changing it from a separate struct to a typedef. The #[repr(transparent)] stopped working since Vec4 from rustc's perspective is just a regular struct, which causes us to emit a struct wrapping OpTypeVector f32 4 instead of the OpTypeVector directly. In glsl you explicitly use UVec4 for the bitmasks, which I'm not a big fan of since they represent something quite different than just a UVec4. So I've opted for a wrapper struct that you may manually construct or destruct if you need to.

Alternative to keep the struct: I have a branch where I've adjusted all the intrinsics to unwrap the struct, which works flawlessly. The problem are entry point params like these:

#[spirv(subgroup_eq_mask)] subgroup_eq_mask: SubgroupMask,
#[spirv(subgroup_ge_mask)] subgroup_ge_mask: SubgroupMask,
#[spirv(subgroup_gt_mask)] subgroup_gt_mask: SubgroupMask,
#[spirv(subgroup_le_mask)] subgroup_le_mask: SubgroupMask,
#[spirv(subgroup_lt_mask)] subgroup_lt_mask: SubgroupMask,

I'd have to refactor our entry point logic to allow us to not just pass a plain BuiltIn, but also modify it aka. wrap it with a SubgroupMask. But since these are rarely used (like even more rare than any of the subgroup intrinsics themselves), I could see us changing their type to UVec4 and having the end user manually convert them, for now. Whenever we decide to revisit BuiltIns we can fix that. (I really want to change most read-only builtins to global functions in spirv-std, so they're not just magic values you have to know / research in the code)

@nnethercote
Copy link
Contributor

nnethercote commented Sep 18, 2025

Vec3A 0xb00 16 16 16 16 *12* *4*
Vec4 0xe00 16 16 16 16 16 *4*

Why is Vec3A's size only 12 bytes?

Why are Vec3A and Vec4s alignment only 4 bytes?

@Firestar99
Copy link
Member Author

glam is explicitly hiding the alignment specifiers on our spirv target. The table is a bit outdated, I already removed the cfg's in the glam branch this repo links to.

I'm also on like the 3rd implementation, each time solving it slightly differently, but I'm happy with what I have right now. Though it'll block the release until glam has released a minor 0.30.6 patch for us fixing this on glam's side and we'll loose backwards compat.

#[cfg_attr(not(target_arch = "spirv"), repr(align(16)))]
#[cfg_attr(not(target_arch = "spirv"), repr(C))]
#[cfg_attr(target_arch = "spirv", repr(simd))]
pub struct Vec3A {
    pub x: f32,
    pub y: f32,
    pub z: f32,
}

https://github.com/bitshifter/glam-rs/blob/9a992a8bacba784053368bc0b7000c95aa5895b6/src/f32/scalar/vec3a.rs#L30-L37

#[cfg_attr(
    any(
        not(any(feature = "scalar-math", target_arch = "spirv")),
        feature = "cuda"
    ),
    repr(align(16))
)]
#[cfg_attr(not(target_arch = "spirv"), repr(C))]
#[cfg_attr(target_arch = "spirv", repr(simd))]
pub struct Vec4 {
    pub x: f32,
    pub y: f32,
    pub z: f32,
    pub w: f32,
}

https://github.com/bitshifter/glam-rs/blob/9a992a8bacba784053368bc0b7000c95aa5895b6/src/f32/scalar/vec4.rs#L27-L41

@FacelessTiger
Copy link

With the glam patch will it only match on default settings, or if you use the scalar-math feature will Vec4 be 4 byte alignement for example

@Firestar99
Copy link
Member Author

I assumed there's going to me more issues with glam's cuda and scalar-math features, since both of them modify alignments of structs. Currently modifying the abi::vector_layout difftest to also test with these features and verifying that feature unification didn't mess with the results.

@Firestar99 Firestar99 changed the base branch from difftest_refactor to main September 18, 2025 15:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants