Skip to content

Qwen3.5 4B(merged LORA adapter) convert to MLX generates incorrect results #1058

@clcy1029

Description

@clcy1029

Hi MLX community. I have a Qwen3.5-4B LoRA adapter that I merged with the base model and converted to MLX format using mlx_lm.convert with quantize=False. (the goal is to run the fined tune model on Mac studio/macbook)

When I run greedy decoding (argmax, no sampling) on the same prompt, the MLX version produces different output compared to the original HuggingFace Transformers version (both base+LoRA and merged weights in HF format produce identical results -- so I suspect the conversion to mlx may have issues). The first ~20-30 tokens match, then the outputs starts diverge (bad results and quality). Both backends use BF16 weights. I verified that weight loading, tokenization, the LoRA merge, and the MLX conversion but can't find any clues. Is MLX expected to produce the exact same output as HuggingFace Transformers for the same model and prompt with greedy decoding. (I am using MLX-lm now). Did anyone have experience on this before? Thank you!

One idea I have is that Qwen3.5 is vision-text unified model, I tried mlx-vlm but seems qwen3.5 is not supported there.

Thanks

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions