[Bugfix] Schedule failure due to wrong get_image_size_with_most_features #29692

tomtomjhj · 2025-11-28T17:23:05Z

Purpose

The scheduler may fail to schedule a batch if get_image_size_with_most_features is wrong and max_num_batched_tokens is not big enough:

Some models' get_image_size_with_most_features returns an image size that is encoded into a sequence shorter than the actual maximum. That is, max_tokens_per_mm_item < num_mm_input_tokens is possible.

encoder_compute_budget is the max of the sequence length of an image with size get_image_size_with_most_features and max_num_batched_tokens (= max_num_encoder_input_tokens).

vllm/vllm/v1/core/encoder_cache_manager.py

Lines 336 to 338 in 0808eb8

    
           encoder_compute_budget = max( 
        
               scheduler_config.max_num_encoder_input_tokens, max_tokens_per_mm_item 
        
           )

Therefore, if max_num_batched_tokens < num_mm_input_tokens, it is possible that max_num_encoder_input_tokens < num_mm_input_tokens, which results in scheduling failure.

vllm/vllm/v1/core/encoder_cache_manager.py

Lines 139 to 141 in 0808eb8

    
           # Not enough compute budget 
        
           if num_tokens > encoder_compute_budget: 
        
               return False

This PR fixes get_image_size_with_most_features of qwen2_vl and gemma3.

qwen2_vl
- Problem: An image with with most features is not a square if the number of features is not a squre.
- Solution: Factorize the max number of features and contruct the image shape that is closest to square to meet the aspect ratio constraint.
gemma3
- Problem: The image is not big enough to trigger cropping.
- Solution: Use the native image size.

Test Plan

Apply the following patch

diff --git a/examples/offline_inference/vision_language.py b/examples/offline_inference/vision_language.py
index 8f72bf6f0..755aba99d 100755
--- a/examples/offline_inference/vision_language.py
+++ b/examples/offline_inference/vision_language.py
@@ -321,6 +321,7 @@ def run_gemma3(questions: list[str], modality: str) -> ModelRequestData:
         max_num_seqs=2,
         mm_processor_kwargs={"do_pan_and_scan": True},
         limit_mm_per_prompt={modality: 1},
+        max_num_batched_tokens=512,
     )
 
     prompts = [
@@ -1527,6 +1528,7 @@ def run_qwen2_5_vl(questions: list[str], modality: str) -> ModelRequestData:
             "fps": 1,
         },
         limit_mm_per_prompt={modality: 1},
+        max_num_batched_tokens=512,
     )
 
     if modality == "image":

Then run

python examples/offline_inference/vision_language.py --model-type qwen2_5_vl

and

python examples/offline_inference/vision_language.py --model-type gemma3

Test Result

Without this PR's fix, vLLM will hang due to repeated scheduler failure. This PR fixes this problem.

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Problem: An image with with most features is not a square if the number of features is not a squre. Solution: Factorize the max number of features and contruct the image shape that is closest to square to meet the aspect ratio constraint. Signed-off-by: Jaehwang Jung <tomtomjhj@gmail.com>

Problem: The image is not big enough to trigger cropping. Solution: Use the native image size. Signed-off-by: Jaehwang Jung <tomtomjhj@gmail.com>

github-actions · 2025-11-28T17:23:15Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

gemini-code-assist

Code Review

This pull request addresses a scheduling failure bug caused by an incorrect implementation of get_image_size_with_most_features for the qwen2_vl and gemma3 models. The changes correctly calculate the image dimensions that produce the maximum number of features, which resolves the scheduling issue. For gemma3, the fix replaces a hardcoded image size with one derived from the model's native image size, ensuring that the pan-and-scan cropping logic is correctly triggered. For qwen2_vl, the implementation is updated to calculate the maximum number of features and then determines the optimal image dimensions by finding the factor pair of the feature count that is closest to a square, satisfying aspect ratio constraints. The changes are logical, well-implemented, and directly address the root cause of the bug. The code quality is good, and I have no further suggestions.

DarkLight1337

Thanks for fixing, can you add some regression tests using the image size that causes this failure?

Signed-off-by: Jaehwang Jung <tomtomjhj@gmail.com>

tomtomjhj · 2025-11-29T13:27:01Z

Added unit tests for get_image_size_with_most_features. These tests fail without my fix.

If the seq len can't be factored into a pair that satifies the aspect ratio contraint, decrement the seq len and retry. Signed-off-by: Jaehwang Jung <tomtomjhj@gmail.com>

ywang96 · 2025-11-29T23:20:41Z

Looks like there are some test failures

unsupported operand type(s) for //: 'NoneType' and 'int'

ywang96 · 2025-11-29T23:25:48Z

vllm/model_executor/models/qwen2_vl.py

-        max_image_size, _ = self._get_vision_info(
-            image_width=9999999,
-            image_height=9999999,
-            num_frames=1,
-            image_processor=None,
-        )
-        return max_image_size
+        hf_config = self.get_hf_config()
+        vision_config = hf_config.vision_config
+        patch_size = vision_config.patch_size
+        merge_size = vision_config.spatial_merge_size
+        image_processor = self.get_image_processor()
+        max_pixels = image_processor.max_pixels
+        unit = patch_size * merge_size
+        max_seq_len = max_pixels // (unit * unit)
+
+        def closest_factor_pair(n: int) -> tuple[int, int]:
+            # left <= right
+            for d in range(math.isqrt(n), 0, -1):
+                if n % d == 0:
+                    return d, n // d
+            return 1, n
+
+        height_factor, width_factor = 1, max_seq_len
+        for seq_len in range(max_seq_len, 0, -1):
+            height_factor, width_factor = closest_factor_pair(seq_len)
+            if width_factor / height_factor <= 200:
+                break


Thanks for the bugfix on Qwen2_VL but I don't think this is the right fix. IMO we should fix _get_vision_info itself instead.

I'm also curious if you can provide a repro example on when the current _get_vision_info impl is not accurate? Here we're already giving it a pretty big image size of 9999999, 9999999 so I'm actually a bit surprised that it doesn't give us the max result image size.

_get_vision_info uses smart_resize.
https://github.com/huggingface/transformers/blob/cac0a28c83cf87b7a05495de3177099c635ba852/src/transformers/models/qwen2_vl/image_processing_qwen2_vl.py#L76

smart_resize ensures that the number of pixels in the resized image is less than or equal to max_pixels. If max_pixels is 1280 * 28 * 28, the max number of patches in the resize image is 1280. Since 1280 is not a square number, a square image cannot have the max number of patches regardless of the size. Specifically, the big square image is resized in to 1225 (35 * 35) patches. My patch fixes the issue by factoring 1280 into 32 * 40, and constructing an image size with 32 patches high and 40 patches wide.

I think in this case we should be updating _get_vision_info instead since it's a "bug" of that function rather than get_image_size_with_most_features, correct?

I think _get_vision_info is doing its job correctly. Qwen2 processes huge square image into 35x35 patches when given pixel limit 1280x28x28, so _get_vision_info should follow that behavior.

tomtomjhj added 2 commits November 28, 2025 17:18

fix: gemma3 image with most features

3305cde

Problem: The image is not big enough to trigger cropping. Solution: Use the native image size. Signed-off-by: Jaehwang Jung <tomtomjhj@gmail.com>

tomtomjhj requested a review from sighingnow as a code owner November 28, 2025 17:23

mergify bot added the qwen Related to Qwen models label Nov 28, 2025

gemini-code-assist bot reviewed Nov 28, 2025

View reviewed changes

DarkLight1337 requested a review from ywang96 November 29, 2025 12:16

DarkLight1337 reviewed Nov 29, 2025

View reviewed changes

tests

3b34694

Signed-off-by: Jaehwang Jung <tomtomjhj@gmail.com>

mergify bot added the multi-modality Related to multi-modality (#4194) label Nov 29, 2025

fix: aspect ratio constraint

0076f56

If the seq len can't be factored into a pair that satifies the aspect ratio contraint, decrement the seq len and retry. Signed-off-by: Jaehwang Jung <tomtomjhj@gmail.com>

DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 29, 2025

ywang96 requested changes Nov 29, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bugfix] Schedule failure due to wrong get_image_size_with_most_features #29692

[Bugfix] Schedule failure due to wrong get_image_size_with_most_features #29692

Uh oh!

tomtomjhj commented Nov 28, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Nov 28, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

DarkLight1337 left a comment

Uh oh!

tomtomjhj commented Nov 29, 2025

Uh oh!

ywang96 commented Nov 29, 2025

Uh oh!

ywang96 Nov 29, 2025 •

edited

Loading

Uh oh!

tomtomjhj Nov 30, 2025

Uh oh!

ywang96 Nov 30, 2025 •

edited

Loading

Uh oh!

tomtomjhj Nov 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	encoder_compute_budget = max(
	scheduler_config.max_num_encoder_input_tokens, max_tokens_per_mm_item
	)

	# Not enough compute budget
	if num_tokens > encoder_compute_budget:
	return False

Uh oh!

[Bugfix] Schedule failure due to wrong get_image_size_with_most_features #29692

Are you sure you want to change the base?

[Bugfix] Schedule failure due to wrong get_image_size_with_most_features #29692

Uh oh!

Conversation

tomtomjhj commented Nov 28, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

github-actions bot commented Nov 28, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

DarkLight1337 left a comment

Choose a reason for hiding this comment

Uh oh!

tomtomjhj commented Nov 29, 2025

Uh oh!

ywang96 commented Nov 29, 2025

Uh oh!

ywang96 Nov 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tomtomjhj Nov 30, 2025

Choose a reason for hiding this comment

Uh oh!

ywang96 Nov 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tomtomjhj Nov 30, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tomtomjhj commented Nov 28, 2025 •

edited by github-actions bot

Loading

ywang96 Nov 29, 2025 •

edited

Loading

ywang96 Nov 30, 2025 •

edited

Loading