Skip to content

feat: add TurboQuantStaticCache variant with pre-allocated buffers#1

Open
tengomucho wants to merge 1 commit intoback2matching:mainfrom
tengomucho:static_turbo_cache
Open

feat: add TurboQuantStaticCache variant with pre-allocated buffers#1
tengomucho wants to merge 1 commit intoback2matching:mainfrom
tengomucho:static_turbo_cache

Conversation

@tengomucho
Copy link
Copy Markdown

Implements TurboQuantStaticLayer and TurboQuantStaticCache that pre-allocate all memory (compressed indices, norms, residual FP16, output buffers) at init. Zero dynamic growth during generation, allowing a predictable VRAM budget.

  • New file: static_cache.py
  • 17 tests in test_static_cache.py
  • Exported TurboQuantStaticCache from init.py

Implements TurboQuantStaticLayer and TurboQuantStaticCache that
pre-allocate all memory (compressed indices, norms, residual FP16,
output buffers) at init. Zero dynamic growth during generation, allowing
a predictable VRAM budget.

- New file: static_cache.py
- 17 tests in test_static_cache.py
- Exported TurboQuantStaticCache from __init__.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant