feat: Drop unsupported weights by default during model load#1033
feat: Drop unsupported weights by default during model load#1033spicyneuron wants to merge 2 commits intoml-explore:mainfrom
Conversation
|
Heads up on a use case this would break: models with extra weight keys that are loaded by separate builders. MTP (multi-token prediction) weights in PR #990 are stored as Same pattern applies to any pipeline where extra weights are co-located in the safetensors but consumed by a secondary model (adapter weights, reward heads, draft model weights, etc.). A couple of options that would preserve the cleanup benefit without breaking these cases:
The warning-without-dropping approach is probably the safest default. Users who want strict behavior can opt in. |
|
Interesting. However we do have Why not simply "expose" this parameter to the CLIs ? |
This makes
mlx-lmdrop unsupported weights by default when loading models that contain extra parameters (for examplevision_towerweights from multimodal checkpoints).Problem
mlx-lmdoesn't support vision, but models converted with mlx-vlm still contain a perfectly usable text model.Currently, attempting to load such models results in a hard crash:
That's especially painful right now because MLX quantizations on Hugging Face can still be a bit thin, and sometimes the only available checkpoint is a vision model. Even when that's not the case, needing to keep two versions of the same model in order to use both
mlx-lmandmlx-vlmtakes up a lot of storage.Solution
This PR makes
load_modelfilter out weights that don't exist on the instantiated MLX model before loading. Unsupported extra weights are skipped and logged, while real incompatibilities like missing supported weights still fail as before.The main change is in
mlx_lm/utils.py.Alternatives
I initially implemented this as an opt-in
--drop-unknown-weightsflag, but after testing locally I couldn't really find a case where I would want this disabled. The change felt narrow in scope and general enough in benefit that it made more sense as the default behavior. Easy to revert if folks feel differently.Add a
--disable-strict-loadstyle flag that togglesstrict=False. I think that goes too far, since it would also hide incompatibilities. The current approach still preserves hard failures for missing supported weights.