Rename forward combine functions and clarify comments by LoserCheems · Pull Request #270 · HKUSTDial/flash-sparse-attention

LoserCheems · 2026-04-22T07:42:03Z

Summary

This update renames functions related to the forward combine process to reflect their purpose in decoding.

Root Cause

The naming convention for functions did not accurately represent their functionality, leading to potential confusion.

Changes

Renamed functions and variables from fwd_combine to dec_combine and updated related comments for clarity.

Reproduction

No specific bug to reproduce; this is a refactor for clarity.

Tests

No new tests added; existing functionality remains unchanged.

Compatibility

No backward compatibility issues anticipated.

Checklist

Linked issue provided
Adds or updates tests
Updates docs if needed
No perf regressions

Co-authored-by: Copilot <copilot@github.com>

… and update related comments Co-authored-by: Copilot <copilot@github.com>

…d docstring Co-authored-by: Copilot <copilot@github.com>

…rences Co-authored-by: Copilot <copilot@github.com>

…n forward kernels

…n gated attention kernels

…ase_kernel

…functions

…and functionality Co-authored-by: Copilot <copilot@github.com>

Copilot

Pull request overview

This PR aims to clarify naming around the “forward combine” path by renaming it to “decode combine”, and updates related comments/usages across Triton kernels.

Changes:

Renames launch/grid helpers from get_fwd_combine_* to get_dec_combine_*.
Renames Triton combine kernel/entrypoint in flash_dec_combine.py and updates imports in forward paths.
Refactors softmax helpers by changing online_softmax’s signature and introducing online_sparse_softmax, updating several kernels accordingly.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
flash_sparse_attn/ops/triton/launch_template.py	Renames combine launch-config helper and updates associated comments.
flash_sparse_attn/ops/triton/launch_grid.py	Renames combine grid helper and updates docstring.
flash_sparse_attn/ops/triton/flash_sparse_fwd.py	Updates combine-module import and sparse softmax call site.
flash_sparse_attn/ops/triton/flash_gated_fwd.py	Updates combine-module import and sparse softmax call site.
flash_sparse_attn/ops/triton/flash_gated_bwd.py	Comment-only change (“pointer” → “pointers”).
flash_sparse_attn/ops/triton/flash_dense_fwd.py	Removes `block_max` plumbing and updates softmax call/returns; updates combine-module import.
flash_sparse_attn/ops/triton/flash_dec_combine.py	Renames combine kernel and Python entrypoint; updates helper calls to renamed launch/grid funcs.
flash_sparse_attn/ops/triton/flash_bwd_preprocess.py	Comment-only change (“pointer” → “pointers”).
flash_sparse_attn/ops/triton/flash_bwd_postprocess.py	Comment-only change (“pointer” → “pointers”).
flash_sparse_attn/ops/triton/activations.py	Changes `online_softmax` API and introduces `online_sparse_softmax`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-22T07:46:21Z


    if is_split_kv:
-        flash_fwd_combine._flash_attn_fwd_combine(
+        flash_dec_combine._flash_attn_fwd_combine(


Varlen path still calls flash_dec_combine._flash_attn_fwd_combine, but the renamed function is _flash_attn_dec_combine. This will break split-KV combine at runtime.

Suggested change

flash_dec_combine._flash_attn_fwd_combine(

flash_dec_combine._flash_attn_dec_combine(

Copilot · 2026-04-22T07:46:21Z


    if is_split_kv:
-        flash_fwd_combine._flash_attn_fwd_combine(
+        flash_dec_combine._flash_attn_fwd_combine(


This call uses flash_dec_combine._flash_attn_fwd_combine, but flash_dec_combine.py defines _flash_attn_dec_combine after the rename. Update the call (or add an alias) to avoid runtime failures in the split-KV path.

Suggested change

flash_dec_combine._flash_attn_fwd_combine(

flash_dec_combine._flash_attn_dec_combine(

Copilot · 2026-04-22T07:46:22Z


    if is_split_kv:
-        flash_fwd_combine._flash_attn_fwd_combine(
+        flash_dec_combine._flash_attn_fwd_combine(


Varlen split-KV combine still calls _flash_attn_fwd_combine, but the function was renamed to _flash_attn_dec_combine in flash_dec_combine.py. Update this call to match the new name.

Suggested change

flash_dec_combine._flash_attn_fwd_combine(

flash_dec_combine._flash_attn_dec_combine(

Copilot · 2026-04-22T07:46:22Z

+    scale_log2,
+    SOFTMAX_THRESHOLD_LOG2: tl.constexpr,
+    CHECK_INF: tl.constexpr,
+):


SOFTMAX_THRESHOLD_LOG2 is declared as tl.constexpr, but call sites pass a runtime value computed inside the kernel (e.g., via seqlen_info.get_softmax_threshold). Triton requires tl.constexpr arguments to be compile-time constants, so this is likely to fail compilation; make the threshold a regular runtime argument (remove tl.constexpr) or restructure so the threshold is known at specialization time.

Copilot · 2026-04-22T07:46:22Z

 def online_softmax(
    acc_s,
-    block_max,
    row_max,
    row_sum,
    scale_log2,
-    softmax_threshold_log2,
    CHECK_INF: tl.constexpr,
    RESCALE_THRESHOLD: tl.constexpr,
 ):
    """
-    Apply online softmax to acc_s, and update block_max, row_max and row_sum.
+    Apply online softmax to acc_s, and update row_max and row_sum.



PR description says this is a pure rename/comment-clarification refactor, but this file also changes the softmax API (signature changes to online_softmax and introduces online_sparse_softmax). If this is intentional, please update the PR description to reflect the behavioral/API surface changes; otherwise, consider limiting the PR to naming/comment changes only.

Copilot · 2026-04-22T07:46:23Z


    if is_split_kv:
-        flash_fwd_combine._flash_attn_fwd_combine(
+        flash_dec_combine._flash_attn_fwd_combine(


flash_dec_combine no longer defines _flash_attn_fwd_combine (it was renamed to _flash_attn_dec_combine), so this call will raise an AttributeError. Update the call site (and any similar ones) to use the renamed entrypoint, or add a backwards-compatible alias in flash_dec_combine.py.

Suggested change

flash_dec_combine._flash_attn_fwd_combine(

flash_dec_combine._flash_attn_dec_combine(

Copilot · 2026-04-22T07:46:23Z


    if is_split_kv:
-        flash_fwd_combine._flash_attn_fwd_combine(
+        flash_dec_combine._flash_attn_fwd_combine(


flash_dec_combine exports _flash_attn_dec_combine, not _flash_attn_fwd_combine, so this varlen combine path will fail at runtime. Update to the renamed function (or keep an alias).

Suggested change

flash_dec_combine._flash_attn_fwd_combine(

flash_dec_combine._flash_attn_dec_combine(

Copilot · 2026-04-22T07:46:23Z


    if is_split_kv:
-        flash_fwd_combine._flash_attn_fwd_combine(
+        flash_dec_combine._flash_attn_fwd_combine(


This still calls flash_dec_combine._flash_attn_fwd_combine, but the function was renamed to _flash_attn_dec_combine in flash_dec_combine.py. Update the call to prevent runtime failures when is_split_kv is true.

Suggested change

flash_dec_combine._flash_attn_fwd_combine(

flash_dec_combine._flash_attn_dec_combine(

…functions for consistency

…e attention kernels

…oftmax function Co-authored-by: Copilot <copilot@github.com>

Copilot

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-22T07:58:47Z

@@ -75,15 +75,14 @@ def _fwd_inner_sparse_base_kernel(

    # Apply online softmax


The comment says "Apply online softmax" but the code now calls online_sparse_softmax, which includes sparse-specific behavior (block-level skip). Update the comment to avoid confusion when debugging/maintaining this kernel.

Suggested change

# Apply online softmax

# Apply online sparse softmax (may skip fully masked/sparse blocks)

Copilot · 2026-04-22T07:58:47Z

@@ -99,15 +99,14 @@ def _fwd_inner_gated_base_kernel(

        # Apply online softmax


The comment says "Apply online softmax" but the implementation uses online_sparse_softmax (sparse/block-threshold variant). Please update the comment so it matches the actual routine being used.

Suggested change

# Apply online softmax

# Apply online sparse softmax (sparse/block-threshold variant)

Copilot · 2026-04-22T07:58:47Z

 def online_softmax(
+    acc_s,
+    row_max,
+    row_sum,
+    scale_log2,
+    CHECK_INF: tl.constexpr,
+    RESCALE_THRESHOLD: tl.constexpr,
+):


This PR is described as a pure rename/comment clarity change, but this file introduces a new online_softmax implementation and renames the previous implementation to online_sparse_softmax (plus signature/return-value changes used by dense kernels). Please update the PR description to reflect these functional refactors, or split the refactor into a separate PR for easier review/risk assessment.

LoserCheems and others added 11 commits April 21, 2026 17:37

Rename flash_fwd_combine to flash_dec_combine

79a7670

Co-authored-by: Copilot <copilot@github.com>

Rename get_fwd_combine_launch_config to get_dec_combine_launch_config…

7650a66

… and update related comments Co-authored-by: Copilot <copilot@github.com>

Rename get_fwd_combine_grid to get_dec_combine_grid and update relate…

4dc2796

…d docstring Co-authored-by: Copilot <copilot@github.com>

Fix comments to clarify pointer advancements in _bwd_postprocess_kernel

b511322

Clarify comments for pointer advancements in _bwd_preprocess_kernel

c6507bb

Rename flash_fwd_combine to flash_dec_combine and update related refe…

af18e0c

…rences Co-authored-by: Copilot <copilot@github.com>

Rename flash_fwd_combine to flash_dec_combine and update references i…

31ee171

…n forward kernels

Rename flash_fwd_combine to flash_dec_combine and update references i…

318de29

…n gated attention kernels

Fix comment to clarify LSE pointer advancements in _bwd_inner_gated_b…

f772c1f

…ase_kernel

Standardize parameter naming from SCALE_LOG2 to scale_log2 in kernel …

40e1baf

…functions

Refactor online_softmax and online_sparse_softmax to improve clarity …

a795d01

…and functionality Co-authored-by: Copilot <copilot@github.com>

Copilot AI review requested due to automatic review settings April 22, 2026 07:42

Copilot started reviewing on behalf of LoserCheems April 22, 2026 07:42 View session

Copilot AI reviewed Apr 22, 2026

View reviewed changes

LoserCheems and others added 3 commits April 22, 2026 15:49

Rename _flash_attn_fwd_combine to _flash_attn_dec_combine in forward …

e7f411d

…functions for consistency

Standardize parameter naming for softmax threshold in gated and spars…

7ef17e6

…e attention kernels

Standardize parameter naming for softmax threshold in online_sparse_s…

41c6f09

…oftmax function Co-authored-by: Copilot <copilot@github.com>

LoserCheems requested a review from Copilot April 22, 2026 07:53

Copilot started reviewing on behalf of LoserCheems April 22, 2026 07:54 View session

Copilot AI reviewed Apr 22, 2026

View reviewed changes

LoserCheems merged commit 9737696 into main Apr 22, 2026
4 checks passed

	flash_dec_combine._flash_attn_fwd_combine(
	flash_dec_combine._flash_attn_dec_combine(

		@@ -75,15 +75,14 @@ def _fwd_inner_sparse_base_kernel(

		# Apply online softmax

	# Apply online softmax
	# Apply online sparse softmax (may skip fully masked/sparse blocks)

		@@ -99,15 +99,14 @@ def _fwd_inner_gated_base_kernel(

		# Apply online softmax

	# Apply online softmax
	# Apply online sparse softmax (sparse/block-threshold variant)

Conversation

LoserCheems commented Apr 22, 2026

Summary

Root Cause

Changes

Reproduction

Tests

Compatibility

Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants