Skip to content

feat: TurboQuant enhancements — layer-adaptive, beta codebook, temporal decay#37

Open
MaTriXy wants to merge 1 commit intoTheTom:mainfrom
MaTriXy:turboquant-enhancements
Open

feat: TurboQuant enhancements — layer-adaptive, beta codebook, temporal decay#37
MaTriXy wants to merge 1 commit intoTheTom:mainfrom
MaTriXy:turboquant-enhancements

Conversation

@MaTriXy
Copy link
Copy Markdown

@MaTriXy MaTriXy commented Mar 26, 2026

Summary

  • Layer-adaptive compressor — per-layer bit-width config. Last 20% of layers get 8-bit, first 80% get 3-bit. Mode 2: PPL +0.14% at 3.5x compression.
  • Beta distribution codebook — Lloyd's on true Beta(d/2, d/2) for d<256. Tighter MSE distortion.
  • Temporal decay compressor — age-based bit allocation with layer-aware thresholds. Python logic complete.

Files changed

  • turboquant/layer_adaptive.py — LayerAdaptiveCompressor (new)
  • turboquant/temporal_decay.py — TemporalDecayCompressor (new)
  • turboquant/codebook.py — added _lloyds_beta(), use_beta param
  • turboquant/__init__.py — new exports
  • tests/test_layer_adaptive.py — 16 tests (new)
  • tests/test_codebook_beta.py — 22 tests (new)
  • tests/test_temporal_decay.py — 22 tests (new)
  • docs/turboquant-enhancements.md — documentation (new)

Test plan

  • 201 tests pass (141 original + 60 new)
  • Benchmark beta codebook MSE vs gaussian for d=64, 128, 256
  • Validate temporal decay bit selection across layer/age combos

Apply TurboQuant paper principles:
- Layer-adaptive KV cache compression (3-bit early layers, 8-bit sensitive late layers)
- Beta(d/2,d/2) codebook for small dimensions (tighter MSE vs Gaussian approx)
- Temporal decay compressor (age-based bit-width selection with layer awareness)
- 60 new tests (16 + 22 + 22), all passing
@TheTom
Copy link
Copy Markdown
Owner

TheTom commented Apr 2, 2026

hey there. thank you for the contribution. i'll be getting to them. I apologize for the delay.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants