Why does my qwen3 model’s performance change after loading MLX safetensors into Hugging Face Transformers? #2956

ChPi · 2025-12-25T11:12:50Z

ChPi
Dec 25, 2025

I started learning SFT and trained a model with MLX. I saved it as a safetensors file and sent it to my friend. After loading the weights with Hugging Face Transformers, the inference results differ significantly.

mx.save_safetensors(f"""./output/model.safetensors""", adapter_weights)
huggingface_model.from_pretrained("./output/", use_safetensors=True, 
                            config = config, output_loading_info=True)

matyushkin · 2026-02-27T13:35:09Z

matyushkin
Feb 27, 2026

There are a few likely reasons why your MLX-trained weights behave differently in HF Transformers:

1. You're saving only adapter (trainable) weights, not the full model

model.trainable_parameters() returns only the LoRA/adapter deltas — not the complete model weights. When you call from_pretrained(), it expects a full checkpoint. Loading partial adapter weights as if they were a complete model will produce nonsensical results.

Fix: Fuse the adapter into the base model first, then save:

python -m mlx_lm.fuse --model base_model_path --adapter-path ./output --save-path ./fused

2. Weight naming mismatch between MLX and PyTorch

MLX and PyTorch/Transformers use different naming conventions for layers. For example:

MLX: layers.0.self_attn.q_proj.weight
Transformers: model.layers.0.self_attn.q_proj.weight

If names don't match, from_pretrained() will silently skip mismatched keys (check the output_loading_info you're already capturing — it should show missing/unexpected keys).

3. Recommended workflow

Use mlx_lm built-in tools for the full pipeline:

# 1. Fuse adapter into base model
python -m mlx_lm.fuse --model base_model --adapter-path ./output --save-path ./fused

# 2. Convert fused MLX model to HF-compatible format
python -m mlx_lm.convert --hf-path ./fused --mlx-path ./hf_output

This handles key remapping and format conversion automatically.

If you need to do it manually, inspect the keys in both files:

import safetensors
mlx_keys = set(safetensors.safe_open('./output/model.safetensors', framework='numpy').keys())
hf_keys = set(huggingface_model.state_dict().keys())
print('Missing in MLX:', hf_keys - mlx_keys)
print('Extra in MLX:', mlx_keys - hf_keys)

This will show you exactly which keys need remapping.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why does my qwen3 model’s performance change after loading MLX safetensors into Hugging Face Transformers? #2956

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Why does my qwen3 model’s performance change after loading MLX safetensors into Hugging Face Transformers? #2956

Uh oh!

ChPi Dec 25, 2025

Replies: 1 comment

Uh oh!

matyushkin Feb 27, 2026

1. You're saving only adapter (trainable) weights, not the full model

2. Weight naming mismatch between MLX and PyTorch

3. Recommended workflow

ChPi
Dec 25, 2025

matyushkin
Feb 27, 2026