concurrency implementation for llama.cpp by victorchall · Pull Request #14 · victorchall/vlm-caption

victorchall · 2025-11-03T21:35:00Z

This adds a simple batching and async.gather loop to enable batch concurrency when using hosts such as llama.cpp that allow concurrent generation. Note: LM Studio does not support batch concurrency, users will have to switch over to llama.cpp or another host that does to use this feature.

Seems like a fairly modest gain using Qwen3 VL 32B on RTX 6000 Blackwell, but still worth it.

Another enhancement for later would be adding using the OpenAI batch api spec that allows batching into JSONL, but this may or may not be supported by any hosts, will be left for a future investigation implementation.
Additionally, I'm not certain llama.cpp and this app will always be guaranteed to align the requests to the slots that have the same history and kv cache for the subsequent requests, but needs more investigation.

Example llama.cpp command to enable concurrency via -np

llama-server -np 4 -c 32768 --mmproj "mmproj-Qwen3-VL-32B-Instruct-F16.gguf" --model "Qwen3-VL-32B-Instruct-Q4_K_M.gguf" -dev cuda0 --top-k 30 --top-p 0.95 --min-p 0.05 --temp 0.5

Note the context size -c should be increased by a multiple of the value for -np to make sure each slot has sufficient context. Ex. -np 4 -c 32768 is 4 slots each with 8192 (32768/4) tokens of context.

Some more info on llama.cpp's -np here: ggml-org/llama.cpp#3677

concurrency implementation for llama.cpp

7512a30

victorchall merged commit d0ef922 into main Nov 3, 2025
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

concurrency implementation for llama.cpp#14

concurrency implementation for llama.cpp#14
victorchall merged 1 commit intomainfrom
concurrency

victorchall commented Nov 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

victorchall commented Nov 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant