From e64edfd1fb1def83c9b874534b1a8d3e2b4d5bf8 Mon Sep 17 00:00:00 2001 From: Aaditya Ura Date: Fri, 7 Nov 2025 22:24:34 -0500 Subject: [PATCH] add multimodal model support documentation - Add LIMIT_MM_PER_PROMPT to configuration table - Update model compatibility section to include multimodal models - Add image input example for vision-language models (LLaVa, PaliGemma) --- README.md | 23 ++++++++++++++++++----- 1 file changed, 18 insertions(+), 5 deletions(-) diff --git a/README.md b/README.md index 8cd7c2f..be81a61 100644 --- a/README.md +++ b/README.md @@ -58,6 +58,7 @@ Configure worker-vllm using environment variables: | `TOOL_CALL_PARSER` | Parser for tool calls | | "mistral", "hermes", "llama3_json", "granite", "deepseek_v3", etc. | | `OPENAI_SERVED_MODEL_NAME_OVERRIDE` | Override served model name in API | | String | | `MAX_CONCURRENCY` | Maximum concurrent requests | 30 | Integer | +| `LIMIT_MM_PER_PROMPT` | Multimodal input (Required for vision models) | | `image=2` or `image=2,video=1` | For the complete list of all available environment variables, examples, and detailed descriptions: **[Configuration](docs/configuration.md)** @@ -290,14 +291,26 @@ This is the format used for GPT-4 and focused on instruction-following and chat. print(response.choices[0].message.content) ``` -### Getting a list of names for available models: +### Image Inputs (Multimodal Models): -In the case of baking the model into the image, sometimes the repo may not be accepted as the `model` in the request. In this case, you can list the available models as shown below and use that name. +For vision-language models, pass images alongside text using the OpenAI format: ```python -models_response = client.models.list() -list_of_models = [model.id for model in models_response] -print(list_of_models) +# Configuration: Set LIMIT_MM_PER_PROMPT=image=2 +response = client.chat.completions.create( + model="llava-hf/llava-1.5-7b-hf", + messages=[ + { + "role": "user", + "content": [ + {"type": "text", "text": "What's in this image?"}, + {"type": "image_url", "image_url": {"url": "[https://example.com/image.jpg](https://example.com/image.jpg)"}} + ] + } + ], + max_tokens=300 +) +print(response.choices[0].message.content) ``` # Usage: Standard (Non-OpenAI)