Qwen 3.5 Unified by will-lms · Pull Request #298 · lmstudio-ai/mlx-engine

will-lms · 2026-03-30T23:40:55Z

This PR unified the Qwen 3.5 model architecture. This requires a patch to the upstream mlx-lm model class to be able to use mrope for multi-modal inputs. There is a large amount of unification code here because of some slight differences in the mlx-lm versus mlx-vlm implementations.To be faithful to each, I call the upstream mlx-lm components for text-only prompts and I have re-implemented the mlx-vlm components outside of their classes for accessibility on vision prompts.

Patched DecoderLayer

I have patched the upstream mlx-lm decoder layer to call mlx-lm's GatedDeltaNet or Qwen3NextAttention (depending on the layer type) for text-only prompts. For vision prompts, it calls the ported implementations from mlx-vlm in _vlm_gated_delta_net and _mrope_attention. The former theoretically is the same across text and vision prompts, but there are some slight differences in the implementations that lead to logits differences.

Patched Text Model

I patched the mlx-lm text model to manage the lifecycle of position_ids and rope_deltas from mlx-vlm. It uses these values in _compute_position_ids for multi-modal prompts. The implementation is based on mlx-vlm's position ID computation.

MOE Model

The MOE model implementation in mlx-lm inherits from the dense model, so we only need to patch the dense implementation. It has a different arch though, so we need both vision add ons. I made the MOE vision add on inherit from the dense add on, as the implementation is mostly shared.

New VisionAddOn Lifecycle Method

The VisionAddOn has a new clear_prediction_state method, which is called by ModelKit at the start of process_prompt. Qwen 3.5 needs this to reset the position_ids and rope_deltas on the Patched text model. It is generic, so could be used for other model state that needs to be cleared between sequential generations in the future.

Qwen VL Utils

The compute_qwen_vl_embeddings method now returns a named dataclass instead of a tuple for readability. It returns an additional field, grid_thw. The caller vision addons have been updated.

E2E Tests

Added the standard test_qwen3_5_vision, test_qwen3_5_text_only, test_qwen3_5_moe_vision, and test_qwen3_5_moe_text_only tests. There are a few more for this model though to cover issues I hit:

test_qwen3_5_vision_then_text_only: Ensures that the mrope state that is patched onto the mlx-lm model implementation does not leak across requests. It does this by observing text generated by a text only prompt with deterministic sampling params is the same before and after a vision prompt
test_qwen3_5_multi_image_process_prompt_preserves_image_positions: Checks that mrope position IDs are applied to all images when there are multiple in a prompt. It does this by finding the image tokens in the prompt and inspecting their position IDs.

Patch Tests

I added tests to cover just the patched model implementation to ensure it if functioning correctly.

test_qwen3_5_prefill_decode_consistency: Assert that model(all_tokens)[-1] == model(tokens[:-1]); model(tokens[-1]) for both text and vision prompts.
test_qwen3_5_mrope_chunked_prefill_matches_unchunked: Assert that prefill chunk boundary does not affect result.
test_qwen3_5_text_only_uncached_matches_prompt_cache: Assert that text only prefill is the same whether a cache is provided or not.
test_qwen3_5_text_only_batch_cache_matches_prompt_cache: Assert that text only prompts can use a BatchKVCache for parallel generation. Note this only works if the model as a non-vision model.
test_qwen3_5_text_only_patched_matches_unpatched: Text-only logits from the patched mlx-lm model must match the unpatched mlx-lm model for both dense and MoE Qwen3.5 variants.
test_qwen3_5_image_prompt_patched_matches_vlm: Image-prompt logits from the patched mlx-lm model must match the native mlx-vlm LanguageModel for both dense and MoE Qwen3.5 variants.

will-lms · 2026-03-31T00:20:38Z

Codex Review

will-lms added 30 commits March 23, 2026 21:10

Patch qwen 3.5 with mrope

e1d4aea

qwen 3.5 vision add on

291ac54

attribution

1387ad5

add test

c45e718

reset mrope state

358a8bf

MOE addon subclasses dense

9c503b8

init _last_grid_thw

977a8e5

comment

9e94e19

Use _init_common

4001bc1

add text after vision test

ac68f29

test_qwen3_5_vision_then_text_only baseline

c9cab0f

test_prefill_decode_consistency_with_mrope

807b5cb

Test text-only

7257e4d

fix rope

1527cd6

reset mrope state

a07b7bf

public members are public

8062dea

Avoid last_grid_thw side effect

192e2d8

Merge branch 'main' into will/qwen3.5-unified

b8f5471

test_qwen3_5_mrope_chunked_prefill_matches_unchunked

f53e9ae

test_qwen3_5_mrope_later_single_image_chunk_matches_unchunked

a2e6217

Fix chunk boundary prefill

82e300e

cache type failures

679e0f6

always compute rope

6d07def

test_qwen3_5_multi_image_process_prompt_preserves_image_positions

120baec

multi-image fix

3d49a30

test text only logits

8f94daa

assert logits same as mlx-lm for text-only

f58d5f1

Use mlx-lm attn for text-only

55ef309

assert logits same as mlx-vlm for image prompts

3788ff3

image prompts match mlx-vlm

d48e5fb

will-lms added 3 commits March 26, 2026 17:33

Patch import order dependency

2bfc479

patched_matches for MOEs

6eb32ca

remove text-only vlm check

72664df

github-actions bot added the CLA signed Indicates that all contributors have signed label Mar 30, 2026

will-lms marked this pull request as ready for review March 31, 2026 00:23

mattjcly approved these changes Mar 31, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qwen 3.5 Unified#298

Qwen 3.5 Unified#298
will-lms wants to merge 33 commits intomainfrom
will/qwen3.5-logits

will-lms commented Mar 30, 2026 •

edited

Loading

Uh oh!

will-lms commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

will-lms commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Patched DecoderLayer

Patched Text Model

MOE Model

New VisionAddOn Lifecycle Method

Qwen VL Utils

E2E Tests

Patch Tests

Uh oh!

will-lms commented Mar 31, 2026

Codex Review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

will-lms commented Mar 30, 2026 •

edited

Loading