Training-free fix for KV cache INT4 failures. Norm separation + per-channel quantization. Qwen2-7B: 744× improvement (ΔPPL +238 → +0.32). 12 models, 124M–40B. 4 lines of PyTorch.
reproducible-research pytorch open-science transformer outliers quantization memory-optimization kv-cache int4 llm-inference norm-separation per-channel-quantization
-
Updated
Apr 16, 2026 - Python