Qwen3.5 4B(merged LORA adapter) convert to MLX generates incorrect results

Hi MLX community. I have a Qwen3.5-4B LoRA adapter that I merged with the base model and converted to MLX format  using mlx_lm.convert with quantize=False. (the goal is to run the fined tune model on Mac studio/macbook)

When I run greedy decoding (argmax, no sampling) on the same prompt, the MLX version produces different output compared to the original HuggingFace Transformers version (both base+LoRA and merged weights in HF format produce identical results -- so I suspect the conversion to mlx may have issues). The first ~20-30 tokens match, then the outputs starts diverge (bad results and quality). Both backends use BF16 weights. I verified that weight loading, tokenization, the LoRA merge, and the MLX conversion but can't find any clues. Is MLX expected to produce the exact same output as HuggingFace Transformers for the same model and prompt with greedy decoding. (I am using MLX-lm now). Did anyone have experience on this before? Thank you!


One idea I have is that Qwen3.5 is vision-text unified model, I tried mlx-vlm but seems qwen3.5 is not supported there. 


Thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qwen3.5 4B(merged LORA adapter) convert to MLX generates incorrect results #1058

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Qwen3.5 4B(merged LORA adapter) convert to MLX generates incorrect results #1058

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions