Replies: 1 comment
-
|
There are a few likely reasons why your MLX-trained weights behave differently in HF Transformers: 1. You're saving only adapter (trainable) weights, not the full model
Fix: Fuse the adapter into the base model first, then save: python -m mlx_lm.fuse --model base_model_path --adapter-path ./output --save-path ./fused2. Weight naming mismatch between MLX and PyTorchMLX and PyTorch/Transformers use different naming conventions for layers. For example:
If names don't match, 3. Recommended workflowUse # 1. Fuse adapter into base model
python -m mlx_lm.fuse --model base_model --adapter-path ./output --save-path ./fused
# 2. Convert fused MLX model to HF-compatible format
python -m mlx_lm.convert --hf-path ./fused --mlx-path ./hf_outputThis handles key remapping and format conversion automatically. If you need to do it manually, inspect the keys in both files: import safetensors
mlx_keys = set(safetensors.safe_open('./output/model.safetensors', framework='numpy').keys())
hf_keys = set(huggingface_model.state_dict().keys())
print('Missing in MLX:', hf_keys - mlx_keys)
print('Extra in MLX:', mlx_keys - hf_keys)This will show you exactly which keys need remapping. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I started learning SFT and trained a model with MLX. I saved it as a safetensors file and sent it to my friend. After loading the weights with Hugging Face Transformers, the inference results differ significantly.
Beta Was this translation helpful? Give feedback.
All reactions