Skip to content

Reduce repeat_interleave calls in apply_F_to_columns#69

Open
sanjanag wants to merge 2 commits intolinkedin:masterfrom
sanjanag:apply-f-to-columns-perf
Open

Reduce repeat_interleave calls in apply_F_to_columns#69
sanjanag wants to merge 2 commits intolinkedin:masterfrom
sanjanag:apply-f-to-columns-perf

Conversation

@sanjanag
Copy link
Copy Markdown
Contributor

Summary

apply_F_to_columns previously did three repeat_interleave calls and a
torch.cat per bucket to build the flat index arrays. Computing cols_rep
once and indexing prefix[cols_rep] / starts[cols_rep] produces the same
arrays with one repeat_interleave and no torch.cat. Same outputs, less
overhead — most visible on GPU where each call has launch latency.

This also adds direct unit-test coverage for apply_F_to_columns (identity,
scaling, multi-bucket equivalence, varying column lengths, output-tensor
mode, empty buckets, ReLU-style clamping).

Test plan

  • New TestApplyFToColumns suite passes (7 cases)
  • Existing tests/test_sparse_utils.py cases unchanged and passing

sanjanag and others added 2 commits April 27, 2026 13:30
Compute cols_rep once and derive idx_in_col / flat_indices via
prefix[cols_rep] and starts[cols_rep] indexing. Drops two
repeat_interleave calls and a torch.cat per bucket. Add direct
unit-test coverage for apply_F_to_columns.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant