Skip to content

Conversation

@Jatkingmodern
Copy link

Added a vectorized forward kernel for improved performance using float4 loads/stores. Implemented a launch wrapper to choose between vectorized and scalar kernel based on conditions.

Added a vectorized forward kernel for improved performance using float4 loads/stores. Implemented a launch wrapper to choose between vectorized and scalar kernel based on conditions.
@meta-cla meta-cla bot added the cla signed label Nov 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant