-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Implement the following optimized sigmoid kernels for float32 and float16 with vectorized versions and PyTorch bindings for improved performance.
-
sigmoid_f32_kernel: Standard sigmoid function forfloat32data type. -
sigmoid_f32x4_kernel: Vectorized sigmoid forfloat32, processing 4 elements at a time (float4). -
sigmoid_f16_kernel: Standard sigmoid function forfloat16(half-precision). -
sigmoid_f16x2_kernel: Vectorized sigmoid forfloat16, processing 2 elements at a time (half2). -
sigmoid_f16x8_kernel: Unpacked version offloat16, processing 8 elements in parallel. -
sigmoid_f16x8_pack_kernel: Packed version ofsigmoid_f16x8_kernelfor efficient memory access. - PyTorch bindings: Expose the above kernels through PyTorch.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request