ml-explore / mlx-lm Public

Notifications You must be signed in to change notification settings
Fork 516
Star 4.3k

Code
Issues 91
Pull requests 89
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Pull requests: ml-explore/mlx-lm

Labels 9 Milestones 0

New pull request New

89 Open 570 Closed

Author

Filter by author

Uh oh!

There was an error while loading. Please reload this page.

Label

Filter by label

Uh oh!

There was an error while loading. Please reload this page.

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Uh oh!

There was an error while loading. Please reload this page.

Milestones

Filter by milestone

Uh oh!

There was an error while loading. Please reload this page.

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Uh oh!

There was an error while loading. Please reload this page.

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

chore: Use json-repair for json_tools tool-call parsing

#1071 opened Mar 29, 2026 by isaac-cf-wong

Loading…

Add TurboQuant KV cache compression (3-bit, 4.6x)

#1067 opened Mar 28, 2026 by arozanov

Loading…

4 of 6 tasks

Fix gated delta kernel precision

#1066 opened Mar 27, 2026 by kernelpool • Draft

[Experimental] Add TurboQuantKVCache: PolarQuant KV cache compression at 2-4 bits

#1059 opened Mar 26, 2026 by rachittshah

Loading…

Add LongCat Next

#1057 opened Mar 26, 2026 by kernelpool

Loading…

lfm2: strip lm_head.weight for tied embeddings

#1055 opened Mar 25, 2026 by ykhrustalev • Draft

Fix IndexError in CacheOrder.pop() on empty cache

#1054 opened Mar 25, 2026 by lyonsno • Draft

2 tasks done

Fix BatchRotatingKVCache.merge() OOB when prompt exceeds max_size

#1052 opened Mar 25, 2026 by LxYuan0420

Loading…

fix tokenizer regex issue with Mistral-based models

#1049 opened Mar 23, 2026 by amanning3390

Loading…

Fix zero prompt-cache reuse for thinking models in multi-turn chat

#1042 opened Mar 22, 2026 by lyonsno

Loading…

skip xtc sampling when threshold is zero

#1040 opened Mar 22, 2026 by mm65x • Draft

Fix prompt cache leak between conversations

#1039 opened Mar 22, 2026 by kernelpool

Loading…

feat: configurable KVCache step size and pre-allocation

#1038 opened Mar 22, 2026 by Thump604

Loading…

5 tasks

Add Mistral Small 4 (119B MoE) support via mistral4.py

#1037 opened Mar 21, 2026 by ProducerGuy

Loading…

5 tasks done

Handle Metal OOM gracefully in mlx_lm.server with structured errors

#1034 opened Mar 21, 2026 by Aristide021

Loading…

feat: Drop unsupported weights by default during model load

#1033 opened Mar 21, 2026 by spicyneuron

Loading…

Fuse gate/up expert projections in SwitchGLU

#1032 opened Mar 21, 2026 by Thump604

Loading…

4 tasks

Use model's generation_config.json for default sampling parameters

#1031 opened Mar 20, 2026 by eyupcanakman

Loading…

Fix per-request enable_thinking toggle in server

#1030 opened Mar 20, 2026 by eyupcanakman

Loading…

Fix CacheDataset.itemlen returning wrong length

#1029 opened Mar 20, 2026 by eyupcanakman

Loading…

Fix A_log precision in mamba.py

#1028 opened Mar 20, 2026 by eyupcanakman

Loading…

Stop generation on consecutive duplicate tool calls

#1027 opened Mar 20, 2026 by chopchop-jiahao • Draft

Fix SSM dt clamp default for Nemotron-H

#1026 opened Mar 20, 2026 by kernelpool

Loading…

Feature/slem with context aware

#1025 opened Mar 19, 2026 by krzysiekfonal

Loading…

[transformers-to-mlx skill] Add OLMo Hybrid model support

#1023 opened Mar 19, 2026 by pcuenca

Loading…

Previous 1 2 3 4 Next

Previous Next

ProTip! Updated in the last three days: updated:>2026-03-27.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!