Qwen3-VL-2B ONNX export produces incorrect logits distribution

Hi, I know you said this isn't a place specifically for ONNX but MNN, but I'd still like to bring it to your attention, maybe it helps. I noticed it when I had the chance to test it today. 

```bash
llmexport --path ./Qwen3-VL-2B-Instruct \
  --dst_path ./qwen3_onnx \
  --export onnx \
  --quant_bit 4 \
  --lm_quant_bit 8
```

I tried to run very simple inference with image input, After prompt 
`<im_start>assistant\n`, first token should predict words with probabilities like:
```
- "The": ~40%
- "This": ~20%
- "A": ~15%

```
But actual Behavior:
First token predictions are:
```
- " " (space): 13%
- "," (comma): 11%
- "\n" (newline): 9%
- "a": 3%

```
So the problem is model generates repetitive things ** (commas, spaces, newlines) instead of coherent text. I also experimented with the source code to solve this, but I couldn't achieve, yet...

I've ruled out inference-side issues:
- Visual encoder works correctly (outputs valid embeddings)
- Position IDs constructed correctly for mrope (3D coordinates)
- Deepstack features extracted and applied to correct layers (5, 11, 17)
- KV cache grows properly
- Attention masks correct
-  No NaN/Inf values in weights
- NNX graph structure looks correct (no FakeLinear nodes remain)

So the bug manifests probably before generation** - first token probabilities after prefill are already wrong? 

I have some guesses about the problems, but I'm not sure. Still, assume I'm thinking out loud.
**Weight transpose** during FakeLinear→MatMul conversion in `onnx_rebuilder.py`
**MRoPE export** - Qwen3-VL uses 3D rotary position encoding which may not export correctly
**Deepstack integration** - auxiliary visual features injection into transformer layers

I also modified the code to correctly apply deepstack to layers 5, 11, 17 (**In `llmexport/llmexport.py` (forward method)**:):

https://github.com/wangzhaode/llm-export/blob/c742d29c172258ac4745013dcf2aefaec3feeebf/llmexport/llmexport.py#L404-L405

with
```python
if deepstack_embeds is not None:
    if hasattr(self, 'visual') and self.visual is not None and hasattr(self.visual, 'deepstack_visual_indexes'):
        if i in self.visual.deepstack_visual_indexes:
            idx = self.visual.deepstack_visual_indexes.index(i)
            if idx < deepstack_embeds.shape[0]:
                hidden_states += deepstack_embeds[idx]
```

also changed deepstack shape from [3, 1, hidden_size] to [3, seq_len, hidden_size]: to test it:
https://github.com/wangzhaode/llm-export/blob/c742d29c172258ac4745013dcf2aefaec3feeebf/llmexport/llmexport.py#L826-L827

and experimented with the dynamic axis
```python
qwen3_dynamic_axes = self.model_dynamic_axes.copy()
qwen3_dynamic_axes['deepstack_embeds'] = {1: 'seq_len'}
```

But unfortunately, these changes didn't fix the issue.

Thank you very much for your contributions!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qwen3-VL-2B ONNX export produces incorrect logits distribution #79

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

	if deepstack_embeds is not None and i in range(deepstack_embeds.shape[0]):
	hidden_states += deepstack_embeds[i]

	# add deepstack_embeds input
	deepstack_embeds = torch.randn(3, 1, self.hidden_size)

Qwen3-VL-2B ONNX export produces incorrect logits distribution #79

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions