Skip to content

Fix gated delta kernel precision#1066

Draft
kernelpool wants to merge 1 commit intoml-explore:mainfrom
kernelpool:fix-gated-delta-kernel
Draft

Fix gated delta kernel precision#1066
kernelpool wants to merge 1 commit intoml-explore:mainfrom
kernelpool:fix-gated-delta-kernel

Conversation

@kernelpool
Copy link
Copy Markdown
Contributor

@kernelpool kernelpool commented Mar 27, 2026

Adds Kahan compensation and unrolls loops. Fixes #1061

mlx-community/Qwen3.5-35B-A3B-4bit

Context Before pp After pp Δ pp Before tg After tg Δ tg Before Mem After Mem Δ Mem
0.5k 2040.1 2039.2 -0.0% 82.1 83.6 +1.8% 20.24 GB 20.24 GB 0.0%
1k 2574.4 2551.9 -0.9% 81.7 84.1 +2.9% 20.57 GB 20.57 GB 0.0%
2k 2830.8 2818.4 -0.4% 82.2 85.8 +4.4% 21.40 GB 21.40 GB 0.0%
4k 2905.8 2889.3 -0.6% 85.0 84.8 -0.3% 21.66 GB 21.65 GB 0.0%
8k 2845.5 2827.4 -0.6% 84.2 84.3 +0.2% 22.02 GB 22.02 GB 0.0%
16k 2627.5 2614.6 -0.5% 83.3 84.1 +1.0% 22.70 GB 22.70 GB 0.0%
32k 2245.7 2235.4 -0.5% 80.3 79.9 -0.4% 23.96 GB 23.96 GB 0.0%
Avg 2581.4 2568.0 -0.5% 82.7 83.8 +1.4%
PPL 4.14 4.14 0.0%

@kernelpool kernelpool marked this pull request as draft March 27, 2026 15:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Qwen3.5-35B-A3B-4bit can emit malformed tool-call output around 20k prompt tokens.

1 participant