bugfix: correct attn output with base 2 or e #28840

staugust · 2025-11-17T07:15:29Z

flashinfer attention use 2 as base of lse instead of e, see https://github.com/flashinfer-ai/flashinfer/blob/main/include/flashinfer/attention/mla.cuh#L400

Purpose

correct attn output with proper factor when using context parallel.

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

gemini-code-assist

Code Review

This pull request correctly addresses the issue of FlashInfer using base 2 for its log-sum-exp calculations by introducing a new Triton kernel and a parameter to switch between base e and base 2 computations. The changes are logically sound and correctly applied where FlashInfer is used. My main feedback is on the implementation detail of adding a new Triton kernel, which introduces significant code duplication that could be avoided for better maintainability.

vllm/attention/ops/common.py

chatgpt-codex-connector · 2025-11-17T07:18:49Z

💡 Codex Review

vllm/vllm/v1/attention/backends/flashinfer.py

Lines 252 to 270 in c4a2b74

    
           output_context, lse_context = cp_lse_ag_out_rs( 
        
               output_context_tmp, lse_context_tmp, get_dcp_group(), return_lse=True, is_lse_base_on_e=False, 
        
           ) 
        
           lse_context = lse_context.transpose(0, 1).contiguous() 
        
           output_query, lse_query = self._new_tokens.run( 
        
               prefill_query, 
        
               key, 
        
               value, 
        
               return_lse=True, 
        
           ) 
        
           lse_query = lse_query.transpose(0, 1).contiguous() 
        
           merge_attn_states( 
        
               out, 
        
               output_context, 
        
               lse_context, 
        
               output_query, 
        
               lse_query,

Convert flashinfer LSEs to natural log before merging

When cp_lse_ag_out_rs is called with is_lse_base_on_e=False the new kernel returns log-sum-exp values in base‑2. Immediately after, these LSEs are passed to merge_attn_states, whose implementation assumes natural logarithms (tl.exp/tl.log). Without converting the base‑2 LSEs (e.g., multiply by math.log(2) for both lse_context and the lse_query returned from _new_tokens.run), the scaling factors inside merge_attn_states are computed against the wrong exponent base, producing incorrect weighting when combining context and query attention states under context parallelism.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

staugust · 2025-11-19T07:20:02Z

@pavanimajety Would you like to take a look at this issue? I'm wondering which repo to fix this, flashinfer or vllm.

mergify · 2025-11-19T21:34:53Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @staugust.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

heheda12345 · 2025-11-26T00:46:02Z

CC @pavanimajety

LucasWilkinson

LGTM ( @pavanimajety should look at too though since im not as familiar with when FlashInfer is base2 )

staugust · 2025-11-28T03:13:55Z

@LucasWilkinson @heheda12345 @pavanimajety From state.cuh:45 ,we can figure that flashinfer use 2 as base for all lse computation.

Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com>

hl475 · 2025-11-29T17:49:07Z

@staugust can you please take a look of the failure in https://buildkite.com/vllm/ci/builds/41171/steps/table?jid=019ace6a-57d7-4bc7-a3aa-c99174395dbd and https://buildkite.com/vllm/ci/builds/41171/steps/table?jid=019ace6a-57db-4e0b-9528-e04a0af07b6a , e.g.


2025-11-29 00:52:08 PST | (Worker_TP0_DCP0 pid=3384) ERROR 11-29 00:52:08 [multiproc_executor.py:822]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/attention/backends/mla/common.py", line 2064, in forward
-- | --
2025-11-29 00:52:08 PST | (Worker_TP1_DCP1 pid=3385) ERROR 11-29 00:52:08 [multiproc_executor.py:822]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-11-29 00:52:08 PST | (Worker_TP0_DCP0 pid=3384) ERROR 11-29 00:52:08 [multiproc_executor.py:822]     is_lse_base_on_e=not self._use_fi_prefill,
2025-11-29 00:52:08 PST | (Worker_TP1_DCP1 pid=3385) ERROR 11-29 00:52:08 [multiproc_executor.py:822]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 413, in __call__
2025-11-29 00:52:08 PST | (Worker_TP0_DCP0 pid=3384) ERROR 11-29 00:52:08 [multiproc_executor.py:822]                          ^^^^^^^^^^^^^^^^^^^^
2025-11-29 00:52:08 PST | (Worker_TP1_DCP1 pid=3385) ERROR 11-29 00:52:08 [multiproc_executor.py:822]     raise e
2025-11-29 00:52:08 PST | (Worker_TP0_DCP0 pid=3384) ERROR 11-29 00:52:08 [multiproc_executor.py:822] AttributeError: 'CutlassMLAImpl' object has no attribute '_use_fi_prefill'

Do you think it is relevant? Thanks!

hl475 · 2025-11-29T18:36:56Z

Trying to do a forward fix in #29734

staugust · 2025-11-30T03:10:24Z

@hl475 It's relevant. Thank you very much for fixing no attribute error.

Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com>

Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com> Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com>

staugust requested review from LucasWilkinson, mgoin and pavanimajety as code owners November 17, 2025 07:15

mergify bot added nvidia v1 labels Nov 17, 2025

github-project-automation bot added this to NVIDIA Nov 17, 2025

gemini-code-assist bot reviewed Nov 17, 2025

View reviewed changes

vllm/attention/ops/common.py Outdated Show resolved Hide resolved

staugust force-pushed the main branch from c4a2b74 to b728070 Compare November 17, 2025 07:17

mergify bot added the needs-rebase label Nov 19, 2025

LucasWilkinson approved these changes Nov 27, 2025

View reviewed changes

github-project-automation bot moved this to In review in NVIDIA Nov 27, 2025

staugust force-pushed the main branch from 9cf1a4b to 2818d0a Compare November 28, 2025 03:00

mergify bot removed the needs-rebase label Nov 28, 2025

staugust added 8 commits November 28, 2025 11:34

fix merge conflicts

34b39e8

Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com>

add constexpr for triton kernel instead of write a new kernel

e8adf8d

Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com>

format code and pass lse base for flashinfer chunked prefill

abf6be5

Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com>

fix syntax error

015e634

Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com>

recover triton kernel name in ctx.call_kernel

578e9c5

Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com>

simplify code in triton kernel _correct_attn_cp_out_kernel

c2d1373

Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com>

fix merge conflicts

629af91

Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com>

format code with pre-commit

be7721f

Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com>

staugust force-pushed the main branch from 5475f5a to be7721f Compare November 28, 2025 03:34

Merge branch 'main' into main

af09db5

LucasWilkinson added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 28, 2025

Merge branch 'main' into main

a1b6c01

Merge branch 'main' into main

257b4e6

DarkLight1337 merged commit 9726e64 into vllm-project:main Nov 28, 2025
50 checks passed

github-project-automation bot moved this from In review to Done in NVIDIA Nov 28, 2025

hl475 mentioned this pull request Nov 29, 2025

Fix AttributeError about _use_fi_prefill #29734

Merged

5 tasks

kitaekatt pushed a commit to kitaekatt/vllm that referenced this pull request Dec 1, 2025

bugfix: correct attn output with base 2 or e (vllm-project#28840)

2886528

Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com>

amd-hhashemi pushed a commit to amd-hhashemi/vllm that referenced this pull request Dec 2, 2025

bugfix: correct attn output with base 2 or e (vllm-project#28840)

94e570a

Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com> Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

bugfix: correct attn output with base 2 or e #28840

bugfix: correct attn output with base 2 or e #28840

Uh oh!

staugust commented Nov 17, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

chatgpt-codex-connector bot commented Nov 17, 2025

Uh oh!

staugust commented Nov 19, 2025 •

edited

Loading

Uh oh!

mergify bot commented Nov 19, 2025

Uh oh!

heheda12345 commented Nov 26, 2025

Uh oh!

LucasWilkinson left a comment

Uh oh!

staugust commented Nov 28, 2025

Uh oh!

Uh oh!

hl475 commented Nov 29, 2025

Uh oh!

hl475 commented Nov 29, 2025

Uh oh!

staugust commented Nov 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

bugfix: correct attn output with base 2 or e #28840

bugfix: correct attn output with base 2 or e #28840

Uh oh!

Conversation

staugust commented Nov 17, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

chatgpt-codex-connector bot commented Nov 17, 2025

💡 Codex Review

Uh oh!

staugust commented Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mergify bot commented Nov 19, 2025

Uh oh!

heheda12345 commented Nov 26, 2025

Uh oh!

LucasWilkinson left a comment

Choose a reason for hiding this comment

Uh oh!

staugust commented Nov 28, 2025

Uh oh!

Uh oh!

hl475 commented Nov 29, 2025

Uh oh!

hl475 commented Nov 29, 2025

Uh oh!

staugust commented Nov 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

staugust commented Nov 17, 2025 •

edited by github-actions bot

Loading

staugust commented Nov 19, 2025 •

edited

Loading