Skip to content

Comments

2d reduce sum op#21

Open
jonahsamost wants to merge 6 commits intoKernel-Heim:mainfrom
jonahsamost:main
Open

2d reduce sum op#21
jonahsamost wants to merge 6 commits intoKernel-Heim:mainfrom
jonahsamost:main

Conversation

@jonahsamost
Copy link
Contributor

referencing issue #20

this op uses different strategies based on whether or not were summing along the last or first dimension. If we're summing along the last dimension, we use float4 loads, whereas for the first dimension, we load in tiles of dimension [32, 4].

benchmarked on a b200 against a tensor of shapes 4096x4096 it is

Benchmarks:
  reduce_sum dim=-1: 0.043 ms
  reduce_sum dim=0:  0.042 ms
  torch.sum dim=-1:  0.011 ms
  torch.sum dim=0:   0.019 ms

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant