-
Notifications
You must be signed in to change notification settings - Fork 516
Pull requests: ml-explore/mlx-lm
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
chore: Use
json-repair for json_tools tool-call parsing
#1071
opened Mar 29, 2026 by
isaac-cf-wong
Loading…
Add TurboQuant KV cache compression (3-bit, 4.6x)
#1067
opened Mar 28, 2026 by
arozanov
Loading…
4 of 6 tasks
[Experimental] Add TurboQuantKVCache: PolarQuant KV cache compression at 2-4 bits
#1059
opened Mar 26, 2026 by
rachittshah
Loading…
Fix BatchRotatingKVCache.merge() OOB when prompt exceeds max_size
#1052
opened Mar 25, 2026 by
LxYuan0420
Loading…
fix tokenizer regex issue with Mistral-based models
#1049
opened Mar 23, 2026 by
amanning3390
Loading…
Fix zero prompt-cache reuse for thinking models in multi-turn chat
#1042
opened Mar 22, 2026 by
lyonsno
Loading…
feat: configurable KVCache step size and pre-allocation
#1038
opened Mar 22, 2026 by
Thump604
Loading…
5 tasks
Add Mistral Small 4 (119B MoE) support via mistral4.py
#1037
opened Mar 21, 2026 by
ProducerGuy
Loading…
5 tasks done
Handle Metal OOM gracefully in mlx_lm.server with structured errors
#1034
opened Mar 21, 2026 by
Aristide021
Loading…
feat: Drop unsupported weights by default during model load
#1033
opened Mar 21, 2026 by
spicyneuron
Loading…
Use model's generation_config.json for default sampling parameters
#1031
opened Mar 20, 2026 by
eyupcanakman
Loading…
Stop generation on consecutive duplicate tool calls
#1027
opened Mar 20, 2026 by
chopchop-jiahao
•
Draft
[transformers-to-mlx skill] Add OLMo Hybrid model support
#1023
opened Mar 19, 2026 by
pcuenca
Loading…
Previous Next
ProTip!
Updated in the last three days: updated:>2026-03-27.