Skip to content

Revert "Rename triton function"#269

Merged
LoserCheems merged 1 commit intomainfrom
revert-268-optim-combine-func
Apr 21, 2026
Merged

Revert "Rename triton function"#269
LoserCheems merged 1 commit intomainfrom
revert-268-optim-combine-func

Conversation

@LoserCheems
Copy link
Copy Markdown
Collaborator

Reverts #268

Copilot AI review requested due to automatic review settings April 21, 2026 09:33
@LoserCheems LoserCheems merged commit 59fce4f into main Apr 21, 2026
3 checks passed
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Reverts prior Triton module/function rename by switching call sites back to the *_fwd.py / *_bwd.py module naming, and reintroducing dedicated forward-combine and backward pre/post-processing modules used by the attention kernels.

Changes:

  • Update imports in tests and Triton interface to use flash_*_{fwd,bwd} modules.
  • Replace previous combine/preprocess/postprocess module references with flash_fwd_combine, flash_bwd_preprocess, and flash_bwd_postprocess.
  • Add new Triton kernels/modules for forward split-KV combine and backward preprocess/postprocess.

Reviewed changes

Copilot reviewed 8 out of 11 changed files in this pull request and generated no comments.

Show a summary per file
File Description
tests/test_utils.py Updates test imports to the reverted *_fwd/*_bwd module names.
flash_sparse_attn/ops/triton/interface.py Updates interface imports to the reverted module names.
flash_sparse_attn/ops/triton/flash_sparse_fwd.py Switches split-KV combine call to flash_fwd_combine.
flash_sparse_attn/ops/triton/flash_sparse_bwd.py Switches backward pre/post steps to flash_bwd_preprocess/flash_bwd_postprocess.
flash_sparse_attn/ops/triton/flash_gated_fwd.py Switches split-KV combine call to flash_fwd_combine.
flash_sparse_attn/ops/triton/flash_gated_bwd.py Switches backward pre/post steps to flash_bwd_preprocess/flash_bwd_postprocess.
flash_sparse_attn/ops/triton/flash_dense_fwd.py Switches split-KV combine call to flash_fwd_combine.
flash_sparse_attn/ops/triton/flash_dense_bwd.py Switches backward pre/post steps to flash_bwd_preprocess/flash_bwd_postprocess.
flash_sparse_attn/ops/triton/flash_fwd_combine.py Adds Triton kernel + wrapper to combine split-KV partial outputs/LSE into final output/LSE.
flash_sparse_attn/ops/triton/flash_bwd_preprocess.py Adds Triton kernel + wrapper to compute dpsum, lse_log2, and initialize dq_accum.
flash_sparse_attn/ops/triton/flash_bwd_postprocess.py Adds Triton kernel + wrapper to scale/store dq (and optional da) from accumulators.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants