feat: add num_chunks_override to FusedLinearCrossEntropyLoss by WyldeCat · Pull Request #3 · MotifTechnologies/Liger-Kernel

WyldeCat · 2026-04-14T08:15:41Z

Summary

Add num_chunks_override parameter to FLCE forward, Function, and Loss module
Allow users to control chunk count instead of auto-computation (~32 for V=220k)
Free chunk tensors at end of each loop iteration to prevent 2 logits chunks co-existing in GPU memory

Motivation

For large vocab (V=220k), auto-computed chunk count is ~32, causing excessive elementwise kernel launches between chunks. Overriding to 4-8 chunks reduces launch overhead with minimal memory impact (peak dominated by activations, not FLCE logits chunks).

Test plan

8-node motif3 training with num_chunks_override=4: MFU ~28% (vs ~27% default)
Loss values match between chunk configurations
Unit tests

🤖 Generated with Claude Code

Allow users to override the auto-computed chunk count in FLCE. Default auto-calculation yields ~32 chunks for large vocab (V=220k), causing excessive elementwise kernel launches between chunks. Overriding to fewer chunks (e.g. 4-8) reduces kernel launch overhead with minimal memory impact since peak is dominated by activations. Also free chunk tensors (del logits_chunk, grad_logits_chunk, _input_chunk) at end of each loop iteration to prevent two logits chunks co-existing in GPU memory between iterations. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…tropyLoss Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

WyldeCat and others added 2 commits April 14, 2026 08:14

fix: assert num_chunks_override evenly divides BT

35ef85b

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

WyldeCat marked this pull request as ready for review April 14, 2026 08:31

WyldeCat and others added 2 commits April 14, 2026 08:33

feat: add num_chunks as __init__ parameter to LigerFusedLinearCrossEn…

2ba5af6

…tropyLoss Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: add missing None return for num_chunks_override in backward

ae52b0c

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

wanyaworld approved these changes Apr 15, 2026

View reviewed changes

ca1207 merged commit 2a60d4b into main Apr 15, 2026
4 of 7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add num_chunks_override to FusedLinearCrossEntropyLoss#3

feat: add num_chunks_override to FusedLinearCrossEntropyLoss#3
ca1207 merged 4 commits intomainfrom
feat/flce-num-chunks-override-v2

WyldeCat commented Apr 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

WyldeCat commented Apr 14, 2026

Summary

Motivation

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants