Replies: 2 comments
-
|
From Gemini 3.1 Pro: Proposal: Architectural Support for TurboQuant Integration in mlx-lm |
Beta Was this translation helpful? Give feedback.
-
|
Also: https://huggingface.co/flovflo/turboquant-mlx-qwen35-kv |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
https://research.google/blog/turboquant-redefining-ai-efficiency-with-extreme-compression/
Someone already trying it with promising ram saving:
https://www.reddit.com/r/LocalLLaMA/comments/1s36vnk/looking_for_feedback_porting_googles_turboquant/
Beta Was this translation helpful? Give feedback.
All reactions