Skip to content

Qwen 3.5 Unified#298

Open
will-lms wants to merge 33 commits intomainfrom
will/qwen3.5-logits
Open

Qwen 3.5 Unified#298
will-lms wants to merge 33 commits intomainfrom
will/qwen3.5-logits

Conversation

@will-lms
Copy link
Copy Markdown
Contributor

@will-lms will-lms commented Mar 30, 2026

This PR unified the Qwen 3.5 model architecture. This requires a patch to the upstream mlx-lm model class to be able to use mrope for multi-modal inputs. There is a large amount of unification code here because of some slight differences in the mlx-lm versus mlx-vlm implementations.To be faithful to each, I call the upstream mlx-lm components for text-only prompts and I have re-implemented the mlx-vlm components outside of their classes for accessibility on vision prompts.

Patched DecoderLayer

I have patched the upstream mlx-lm decoder layer to call mlx-lm's GatedDeltaNet or Qwen3NextAttention (depending on the layer type) for text-only prompts. For vision prompts, it calls the ported implementations from mlx-vlm in _vlm_gated_delta_net and _mrope_attention. The former theoretically is the same across text and vision prompts, but there are some slight differences in the implementations that lead to logits differences.

Patched Text Model

I patched the mlx-lm text model to manage the lifecycle of position_ids and rope_deltas from mlx-vlm. It uses these values in _compute_position_ids for multi-modal prompts. The implementation is based on mlx-vlm's position ID computation.

MOE Model

The MOE model implementation in mlx-lm inherits from the dense model, so we only need to patch the dense implementation. It has a different arch though, so we need both vision add ons. I made the MOE vision add on inherit from the dense add on, as the implementation is mostly shared.

New VisionAddOn Lifecycle Method

The VisionAddOn has a new clear_prediction_state method, which is called by ModelKit at the start of process_prompt. Qwen 3.5 needs this to reset the position_ids and rope_deltas on the Patched text model. It is generic, so could be used for other model state that needs to be cleared between sequential generations in the future.

Qwen VL Utils

The compute_qwen_vl_embeddings method now returns a named dataclass instead of a tuple for readability. It returns an additional field, grid_thw. The caller vision addons have been updated.

E2E Tests

Added the standard test_qwen3_5_vision, test_qwen3_5_text_only, test_qwen3_5_moe_vision, and test_qwen3_5_moe_text_only tests. There are a few more for this model though to cover issues I hit:

  • test_qwen3_5_vision_then_text_only: Ensures that the mrope state that is patched onto the mlx-lm model implementation does not leak across requests. It does this by observing text generated by a text only prompt with deterministic sampling params is the same before and after a vision prompt
  • test_qwen3_5_multi_image_process_prompt_preserves_image_positions: Checks that mrope position IDs are applied to all images when there are multiple in a prompt. It does this by finding the image tokens in the prompt and inspecting their position IDs.

Patch Tests

I added tests to cover just the patched model implementation to ensure it if functioning correctly.

  • test_qwen3_5_prefill_decode_consistency: Assert that model(all_tokens)[-1] == model(tokens[:-1]); model(tokens[-1]) for both text and vision prompts.
  • test_qwen3_5_mrope_chunked_prefill_matches_unchunked: Assert that prefill chunk boundary does not affect result.
  • test_qwen3_5_text_only_uncached_matches_prompt_cache: Assert that text only prefill is the same whether a cache is provided or not.
  • test_qwen3_5_text_only_batch_cache_matches_prompt_cache: Assert that text only prompts can use a BatchKVCache for parallel generation. Note this only works if the model as a non-vision model.
  • test_qwen3_5_text_only_patched_matches_unpatched: Text-only logits from the patched mlx-lm model must match the unpatched mlx-lm model for both dense and MoE Qwen3.5 variants.
  • test_qwen3_5_image_prompt_patched_matches_vlm: Image-prompt logits from the patched mlx-lm model must match the native mlx-vlm LanguageModel for both dense and MoE Qwen3.5 variants.

@github-actions github-actions bot added the CLA signed Indicates that all contributors have signed label Mar 30, 2026
@will-lms
Copy link
Copy Markdown
Contributor Author

Codex Review

Screenshot 2026-03-30 at 8 20 34 PM

@will-lms will-lms marked this pull request as ready for review March 31, 2026 00:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA signed Indicates that all contributors have signed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants