Skip to content

Qwen3-VL-2B ONNX export produces incorrect logits distribution #79

@altunenes

Description

@altunenes

Hi, I know you said this isn't a place specifically for ONNX but MNN, but I'd still like to bring it to your attention, maybe it helps. I noticed it when I had the chance to test it today.

llmexport --path ./Qwen3-VL-2B-Instruct \
  --dst_path ./qwen3_onnx \
  --export onnx \
  --quant_bit 4 \
  --lm_quant_bit 8

I tried to run very simple inference with image input, After prompt
<im_start>assistant\n, first token should predict words with probabilities like:

- "The": ~40%
- "This": ~20%
- "A": ~15%

But actual Behavior:
First token predictions are:

- " " (space): 13%
- "," (comma): 11%
- "\n" (newline): 9%
- "a": 3%

So the problem is model generates repetitive things ** (commas, spaces, newlines) instead of coherent text. I also experimented with the source code to solve this, but I couldn't achieve, yet...

I've ruled out inference-side issues:

  • Visual encoder works correctly (outputs valid embeddings)
  • Position IDs constructed correctly for mrope (3D coordinates)
  • Deepstack features extracted and applied to correct layers (5, 11, 17)
  • KV cache grows properly
  • Attention masks correct
  • No NaN/Inf values in weights
  • NNX graph structure looks correct (no FakeLinear nodes remain)

So the bug manifests probably before generation** - first token probabilities after prefill are already wrong?

I have some guesses about the problems, but I'm not sure. Still, assume I'm thinking out loud.
Weight transpose during FakeLinear→MatMul conversion in onnx_rebuilder.py
MRoPE export - Qwen3-VL uses 3D rotary position encoding which may not export correctly
Deepstack integration - auxiliary visual features injection into transformer layers

I also modified the code to correctly apply deepstack to layers 5, 11, 17 (In llmexport/llmexport.py (forward method):):

if deepstack_embeds is not None and i in range(deepstack_embeds.shape[0]):
hidden_states += deepstack_embeds[i]

with

if deepstack_embeds is not None:
    if hasattr(self, 'visual') and self.visual is not None and hasattr(self.visual, 'deepstack_visual_indexes'):
        if i in self.visual.deepstack_visual_indexes:
            idx = self.visual.deepstack_visual_indexes.index(i)
            if idx < deepstack_embeds.shape[0]:
                hidden_states += deepstack_embeds[idx]

also changed deepstack shape from [3, 1, hidden_size] to [3, seq_len, hidden_size]: to test it:

# add deepstack_embeds input
deepstack_embeds = torch.randn(3, 1, self.hidden_size)

and experimented with the dynamic axis

qwen3_dynamic_axes = self.model_dynamic_axes.copy()
qwen3_dynamic_axes['deepstack_embeds'] = {1: 'seq_len'}

But unfortunately, these changes didn't fix the issue.

Thank you very much for your contributions!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions