-
Notifications
You must be signed in to change notification settings - Fork 45
Open
Description
Found some interesting things in the pow function
echo "kernel void test(uint pos [[thread_position_in_grid]], device float* out, const device float2* in) { out[pos] = metal::pow(in[pos].x, in[pos].y); }" | python3 compiler_explorer.py - 0: 72091004 get_sr r2, sr80 (thread_position_in_grid.x)
4: 0501440e00c43200 device_load 0, i32, xy, r0_r1, u2_u3, r2, unsigned, lsl 1
c: 3800 wait 0
e: 8a0d80c6 log2 r3.cache, r0.cache.abs
12: 9a8dc6222800 fmul32 r3.cache, r3.discard, r1.cache
18: 8a0dc6d2 exp2 r3.cache, r3.discard
1c: 3a81c0222cc61200 fmadd32 r0, r0.discard, r1.discard, r3.discard
24: 4501400e00c01200 device_store 0, i32, x, r0, u0_u1, r2, unsigned, 0
2c: 8800 stop
The only difference from powr (which doesn't handle negative x) is the .abs on the input and the fmadd32 at the end
I don't think adding the product of the inputs will magically fix up the result for negative numbers (and I ran the function to make sure it actually does calculate pow)
The fmadd32 has bit 52 set (which is currently unused in our decoder), so maybe that's what makes it special?
I'm not currently running an OS supported by hwtestbed, so I'll leave actually testing this to someone who is
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels