Skip to content

pow fixup instruction? #51

@TellowKrinkle

Description

@TellowKrinkle

Found some interesting things in the pow function

echo "kernel void test(uint pos [[thread_position_in_grid]], device float* out, const device float2* in) { out[pos] = metal::pow(in[pos].x, in[pos].y); }" | python3 compiler_explorer.py -
   0: 72091004             get_sr           r2, sr80 (thread_position_in_grid.x)
   4: 0501440e00c43200     device_load      0, i32, xy, r0_r1, u2_u3, r2, unsigned, lsl 1
   c: 3800                 wait             0
   e: 8a0d80c6             log2             r3.cache, r0.cache.abs
  12: 9a8dc6222800         fmul32           r3.cache, r3.discard, r1.cache
  18: 8a0dc6d2             exp2             r3.cache, r3.discard
  1c: 3a81c0222cc61200     fmadd32          r0, r0.discard, r1.discard, r3.discard
  24: 4501400e00c01200     device_store     0, i32, x, r0, u0_u1, r2, unsigned, 0
  2c: 8800                 stop             

The only difference from powr (which doesn't handle negative x) is the .abs on the input and the fmadd32 at the end
I don't think adding the product of the inputs will magically fix up the result for negative numbers (and I ran the function to make sure it actually does calculate pow)
The fmadd32 has bit 52 set (which is currently unused in our decoder), so maybe that's what makes it special?

I'm not currently running an OS supported by hwtestbed, so I'll leave actually testing this to someone who is

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions