You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/models/supported_models.md
-20Lines changed: 0 additions & 20 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -740,23 +740,6 @@ Some models are supported only via the [Transformers modeling backend](#transfor
740
740
<sup>E</sup> Pre-computed embeddings can be inputted for this modality.
741
741
<sup>+</sup> Multiple items can be inputted per text prompt for this modality.
742
742
743
-
!!! warning
744
-
Both V0 and V1 support `Gemma3ForConditionalGeneration` for text-only inputs.
745
-
However, there are differences in how they handle text + image inputs:
746
-
747
-
V0 correctly implements the model's attention pattern:
748
-
- Uses bidirectional attention between the image tokens corresponding to the same image
749
-
- Uses causal attention for other tokens
750
-
- Implemented via (naive) PyTorch SDPA with masking tensors
751
-
- Note: May use significant memory for long prompts with image
752
-
753
-
V1 currently uses a simplified attention pattern:
754
-
- Uses causal attention for all tokens, including image tokens
755
-
- Generates reasonable outputs but does not match the original model's attention for text + image inputs, especially when `{"do_pan_and_scan": true}`
756
-
- Will be updated in the future to support the correct behavior
757
-
758
-
This limitation exists because the model's mixed attention pattern (bidirectional for images, causal otherwise) is not yet supported by vLLM's attention backends.
759
-
760
743
!!! note
761
744
`Gemma3nForConditionalGeneration` is only supported on V1 due to shared KV caching and it depends on `timm>=1.0.17` to make use of its
762
745
MobileNet-v5 vision backbone.
@@ -776,9 +759,6 @@ Some models are supported only via the [Transformers modeling backend](#transfor
776
759
The official `openbmb/MiniCPM-V-2` doesn't work yet, so we need to use a fork (`HwwwH/MiniCPM-V-2`) for now.
777
760
For more details, please see: <https://github.com/vllm-project/vllm/pull/4087#issuecomment-2250397630>
778
761
779
-
!!! warning
780
-
Our PaliGemma implementations have the same problem as Gemma 3 (see above) for both V0 and V1.
781
-
782
762
!!! note
783
763
For Qwen2.5-Omni and Qwen3-Omni, reading audio from video pre-processing (`--mm-processor-kwargs '{"use_audio_in_video": true}'`) is currently work in progress and not yet supported.
0 commit comments