-
Notifications
You must be signed in to change notification settings - Fork 72
abi: Vec3 12 bytes #380
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
abi: Vec3 12 bytes #380
Conversation
a881c58
to
acf572d
Compare
acf572d
to
5c6932a
Compare
0c2df59
to
23fe024
Compare
d8b1b46
to
11c141a
Compare
Hard for me to grasp the why. What are the pros and cons here? Will this break projects like https://github.com/LegNeato/rust-gpu-chimera or will it align spirv with what glam w/ cuda feature and/or rust-cuda does? |
4669fa0
to
e77ed1c
Compare
f4af86b
to
1b4b2d9
Compare
e77ed1c
to
c8e6c60
Compare
#[repr(transparent)] | ||
#[derive(Copy, Clone, Default, Eq, PartialEq)] | ||
#[cfg_attr(feature = "bytemuck", derive(bytemuck::Zeroable, bytemuck::Pod))] | ||
pub struct SubgroupMask(pub glam::UVec4); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a fan of changing it from a separate struct to a typedef. The #[repr(transparent)]
stopped working since Vec4
from rustc's perspective is just a regular struct, which causes us to emit a struct wrapping OpTypeVector f32 4
instead of the OpTypeVector
directly. In glsl you explicitly use UVec4
for the bitmasks, which I'm not a big fan of since they represent something quite different than just a UVec4
. So I've opted for a wrapper struct that you may manually construct or destruct if you need to.
Alternative to keep the struct: I have a branch where I've adjusted all the intrinsics to unwrap the struct, which works flawlessly. The problem are entry point params like these:
rust-gpu/tests/compiletests/ui/arch/subgroup/subgroup_builtins.rs
Lines 11 to 15 in d9bb8aa
#[spirv(subgroup_eq_mask)] subgroup_eq_mask: SubgroupMask, | |
#[spirv(subgroup_ge_mask)] subgroup_ge_mask: SubgroupMask, | |
#[spirv(subgroup_gt_mask)] subgroup_gt_mask: SubgroupMask, | |
#[spirv(subgroup_le_mask)] subgroup_le_mask: SubgroupMask, | |
#[spirv(subgroup_lt_mask)] subgroup_lt_mask: SubgroupMask, |
I'd have to refactor our entry point logic to allow us to not just pass a plain BuiltIn
, but also modify it aka. wrap it with a SubgroupMask
. But since these are rarely used (like even more rare than any of the subgroup intrinsics themselves), I could see us changing their type to UVec4
and having the end user manually convert them, for now. Whenever we decide to revisit BuiltIns we can fix that. (I really want to change most read-only builtins to global functions in spirv-std, so they're not just magic values you have to know / research in the code)
aa8292c
to
f14db4e
Compare
Why is Why are |
glam is explicitly hiding the alignment specifiers on our spirv target. The table is a bit outdated, I already removed the cfg's in the glam branch this repo links to. I'm also on like the 3rd implementation, each time solving it slightly differently, but I'm happy with what I have right now. Though it'll block the release until glam has released a minor #[cfg_attr(not(target_arch = "spirv"), repr(align(16)))]
#[cfg_attr(not(target_arch = "spirv"), repr(C))]
#[cfg_attr(target_arch = "spirv", repr(simd))]
pub struct Vec3A {
pub x: f32,
pub y: f32,
pub z: f32,
} #[cfg_attr(
any(
not(any(feature = "scalar-math", target_arch = "spirv")),
feature = "cuda"
),
repr(align(16))
)]
#[cfg_attr(not(target_arch = "spirv"), repr(C))]
#[cfg_attr(target_arch = "spirv", repr(simd))]
pub struct Vec4 {
pub x: f32,
pub y: f32,
pub z: f32,
pub w: f32,
} |
With the glam patch will it only match on default settings, or if you use the |
I assumed there's going to me more issues with glam's |
f14db4e
to
a727c08
Compare
1b4b2d9
to
24a9bc9
Compare
a727c08
to
b17666f
Compare
b17666f
to
38649e4
Compare
(T, U) pairs in entrypoints and regular functions are now supported.
38649e4
to
b91eda9
Compare
Old embark PR trying to do the same, though I made this PR is made from scratch due to the branches being too different: https://github.com/EmbarkStudios/rust-gpu/pull/1158/files
Why?
Currently, rust-gpu has special layout rules for glam types, which gives them a higher alignment than necessary. This causes structs using them to have a different layout (size, alignment, member offsets) in rust-gpu than they do on the CPU, causing a lot of UB when sending data between host and device. Worst of them is
Vec3
, which has a size of 12 on the CPU but 16 in rust-gpu, which is why we always recommended not to use them in structs shared between targets.Another common workaround was to use
glam
's featurecuda
, which would give all glam vectors a higher than necessary alignment. These alignments seem match those of rust-gpu, though they are defined in very different places and could drift out of sync.This is a mess, and this PR tries to clean it up by:
SpirvType::Vector
size and alignment to match layout of the type#[repr(SIMD)]
with our own#[spirv(vector)]
glam/cuda
feature will continue to work, though must be turned on on both targets!glam::BVecN
(bool vecs)Size and Alignment chart
(rust-gpu chooses chaotic evil)
difftest
SIMD hacks
Notes:
#[repr(SIMD)]
hacks and replace them with a custom#[spirv(vector)]
that we control (and rustc can't change on a whim)