Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
325 commits
Select commit Hold shift + click to select a range
e120533
[Misc] Avoid use of deprecated `AutoModelForVision2Seq` (#25065)
DarkLight1337 Sep 17, 2025
252ada5
Add RADIO Vision Encoder Support to vLLM (#24595)
danielafrimi Sep 17, 2025
9fccd04
[Bugfix] Fix Stream usage in CPU model runner and OneDNN kernel check…
bigPYJ1151 Sep 17, 2025
bfe9380
Apply fixes for CUDA 13 (#24599)
Aidyn-A Sep 17, 2025
1b962e2
[fix] lora benchmarks pass no_lora_flag_cpu (#23774)
dolpm Sep 17, 2025
dd6a910
[Bugfix][Qwen3-Next] fixes the varlen issue in qwen3-next's MTP imple…
sighingnow Sep 17, 2025
47f670b
[Docs] improve code formatting and comments for eliminate griffe buil…
samzong Sep 17, 2025
8f3616f
Remove old cutlass mla (#23961)
MatthewBonanni Sep 17, 2025
4a2d33e
[Docs] vllm/benchmarks/datasets.py fix docstring param format. (#24970)
samzong Sep 17, 2025
087c6ff
[CI Bugfix] Fix failing test_invalid_env (#25078)
mgoin Sep 17, 2025
4b946d6
[V0 Deprecation] Remove V0 Core tests (#25082)
WoosukKwon Sep 17, 2025
4aa8c7b
cleanup: remove adapter commons (#25045)
simon-mo Sep 17, 2025
d6a518f
Remove unused find_cuda_init helper script (#25044)
simon-mo Sep 17, 2025
99cc41a
[V0 Deprecation] Remove unused output processor util (#25023)
WoosukKwon Sep 17, 2025
8b32464
Change log level from info to debug for IOProcessor (#24999)
mgoin Sep 17, 2025
eb68c2d
[CI] Revert back prepare_prompts and check_answers (#25087)
WoosukKwon Sep 17, 2025
9d442b7
[V0 Deprecation] Remove V0 tests in test_sequence.py (#25088)
WoosukKwon Sep 17, 2025
e3db5eb
[CI Bugfix] Fix failing test_model_load_with_params tests due to toke…
mgoin Sep 17, 2025
7ae9887
[V1] Logits processor docs (#22919)
afeldman-nm Sep 17, 2025
ee5fd49
[Misc] Update owners for KV connector and V1 offloading (#25041)
ApostaC Sep 17, 2025
bd04165
cleanup
LucasWilkinson Sep 17, 2025
8831315
[Bugfix] Update import path for bc_linter_include (#24766)
mmangkad Sep 17, 2025
f20c3b0
[BUG] Exclude .pth files when pulling remote files (#25092)
ahao-anyscale Sep 17, 2025
3c068c6
[Kernel] Faster pre-processing time for W4A8 (#23972)
czhu-cohere Sep 17, 2025
bff2e5f
[gpt-oss][2] fix types for streaming (#24556)
qandrew Sep 17, 2025
fedb75f
[Bugfix][B200] Fix `cutlass_mla` hang (#24966)
alexm-redhat Sep 17, 2025
1a456c7
Aiter mha fp8 fix (#24991)
dllehr-amd Sep 17, 2025
9f882d8
Disable failing GPT-OSS Eval (Blackwell) for now (#25107)
mgoin Sep 17, 2025
e67a79d
[Bugfix] Refactor Flashinfer TRTLLM attention kernel selection logic …
elvischenv Sep 17, 2025
2a4d641
Add a batched auto tune script (#25076)
karan Sep 17, 2025
1f84203
add auto cg fallback
LucasWilkinson Sep 17, 2025
2ad6282
Merge remote-tracking branch 'origin/main' into lwilkinson/dbo-prefill
LucasWilkinson Sep 17, 2025
e6585dd
[Bugfix] Fix accuracy issue for silu_mul + nvfp4 quant fusion kernel …
elvischenv Sep 17, 2025
aeb8a47
fix precommit
LucasWilkinson Sep 17, 2025
5963b98
[Kernel] Delegate construction of FusedMoEQuantConfig to FusedMoEMeth…
bnellnm Sep 17, 2025
2c3c1bd
[V0 Deprecation] Remove V0 Engine tests (#25114)
WoosukKwon Sep 18, 2025
2fc24e9
[V0 Deprecation] Remove V0 Tracing & Metrics tests (#25115)
WoosukKwon Sep 18, 2025
6c03661
[V0 Deprecation] Remove misc V0 tests (#25118)
WoosukKwon Sep 18, 2025
7fb2a5b
[V0 Deprecation] Skip PP test (#25128)
WoosukKwon Sep 18, 2025
4ac510f
[Kernels] Enable DeepGEMM by default (#24462)
bnellnm Sep 18, 2025
3127274
[MM Encoder] Apply DP ViT for Qwen3-VL model series (#24955)
ywang96 Sep 18, 2025
32baf1d
[Docs] Clean up the contributing README (#25099)
hmellor Sep 18, 2025
b982196
[Core][MM] Cleanup `MultiModalCache` (#25006)
lgeiger Sep 18, 2025
027d37d
[Bugfix][Qwen3-Next] add prefixes to shared_expert in qwen3-next and …
toncao Sep 18, 2025
dc2979c
[Kernels] Overlap shared experts with combine instead of dispatch (#2…
bnellnm Sep 18, 2025
52bc9d5
[Model] enable data parallel for InternVL vision encoder (#23909)
666even666 Sep 18, 2025
bec060f
Mark prompt logprobs as incompatible with prompt embeds at API level …
qthequartermasterman Sep 18, 2025
3bc1812
[XPU] Whisper model support on XPU Platform (#25123)
chaojun-zhang Sep 18, 2025
9d8a2d8
[EPLB] Add EPLB support for hunyuan_v1 (#23078)
666even666 Sep 18, 2025
5c65a72
[V0 Deprecation] Remove more V0 tests (#25117)
WoosukKwon Sep 18, 2025
b7433ca
[Spec Decode] Efficient padded speculation (#24539)
benchislett Sep 18, 2025
a904ea7
[benchmark] add peak throughput metrics and plot (#23867)
simon-mo Sep 18, 2025
e111d5b
[CLI] Use streaming in CLI chat and completion commands (#23769)
simon-mo Sep 18, 2025
81b16a2
[Kernel] Better inf handling for grouped topk cu (#24886)
lumina37 Sep 18, 2025
349e0e3
[Docs] Fix API Reference (#25140)
hmellor Sep 18, 2025
f4cd80f
Retrieve `sliding_window` from text config in Gemma3 MM (#25085)
hmellor Sep 18, 2025
350c94d
[Bugfix] when use s3 model cannot use default load_format (#24435)
lengrongfu Sep 18, 2025
ef7eefe
[Qwen] Add fp8 checkpoint support for qwen3-next. (#25079)
sighingnow Sep 18, 2025
aa3f105
Add 'path' option to ImagePrompt data_format (#25081)
gfinol Sep 18, 2025
05b044e
[Doc] Fix cross-reference warnings (#25058)
punitvara Sep 18, 2025
29283e8
[Chore] Cleanup guided namespace, move to structured outputs config (…
aarnphm Sep 18, 2025
4f02b77
Fix: Add explicit #include <omp.h> for OpenMP compatibility on certai…
ihb2032 Sep 18, 2025
abdfcd4
silu-v1: Fix EPS not being used during max-reduction (#25069)
elvircrn Sep 18, 2025
cc935fd
[Frontend] Support setting logprobs to -1 (#25031)
chaunceyjiang Sep 18, 2025
3797010
[Model] Improve Pooling Model (#25149)
jeejeelee Sep 18, 2025
8ed039d
Move `StructuredOutputsConfig` from `config/__init__.py` to `config/s…
hmellor Sep 18, 2025
eaffe44
[Docs] Fix pooling-params doc references in openai_compatible_server.…
yankay Sep 18, 2025
c9ff9e6
[Docs] add the parallel sampling usage in LLMEngine and AsyncLLM (#24…
gigit0000 Sep 18, 2025
5a33ae9
Fix forward reference warning in documentation (#25150)
hmellor Sep 18, 2025
3ed1ec4
Fix `validate-config` pre-commit check (#25157)
hmellor Sep 18, 2025
66072b3
[Bugfix][Mamba] - Fix Conv State Kernel FP32 Support (#24883)
Josephasafg Sep 18, 2025
21da733
[Misc] Clean up flags in `vllm bench serve` (#25138)
ywang96 Sep 18, 2025
470484a
[Structured Output][Refactor] Move `apply_grammar_bitmask()` method f…
shen-shanshan Sep 18, 2025
fbd6523
Refactor dense FP8 tensor/channel/block utils and add CT FP8 block (#…
mgoin Sep 18, 2025
bc19d75
[Misc] Add kv-connector label (#25156)
NickLucche Sep 18, 2025
01a583f
[Kernel] Decouple Tile Size from Block Size in Triton Unified Attenti…
jvlunteren Sep 18, 2025
072d7e5
[PERF] Add `conv1d` metadata to GDN attn (#25105)
vadiklyutiy Sep 18, 2025
67244c8
feat(api): Return 503 on /health when engine is dead (#24897)
dongbo910220 Sep 18, 2025
5f696c3
[New Model] Support BertForTokenClassification / Named Entity Recogni…
noooop Sep 18, 2025
b419937
[Docs] Fix warnings in mkdocs build (continued) (#25163)
Zerohertz Sep 18, 2025
2ea50e9
Enable Allgather/ReduceScatter backend for NaiveAllToAll (#23964)
wenscarl Sep 18, 2025
1c3b163
[Misc] Add codeowner for Transformers backend (#25180)
hmellor Sep 18, 2025
23389a1
Merge remote-tracking branch 'origin/main' into lwilkinson/dbo-prefill
LucasWilkinson Sep 18, 2025
c4cb0af
[spec decode] Fix MTP inference path for MiMo-7B model (#25136)
zixi-qi Sep 18, 2025
dc34059
[ROCm][CI/Build] Use ROCm7.0 as the base (#25178)
gshtras Sep 18, 2025
3a1c628
review comments
LucasWilkinson Sep 18, 2025
3a3383d
review comments
LucasWilkinson Sep 18, 2025
bbdc0f2
[ROCm][AITER][Bugfix] Switch AITER to use PIECEWISE_AND_FULL compilat…
Rohan138 Sep 18, 2025
505805b
[KV offload][1/N] Introduce an offloading component (#19848)
orozery Sep 18, 2025
e19bce4
[V0 Deprecation] Remove AsyncLLMEngine (#25025)
WoosukKwon Sep 18, 2025
064cac7
[fix]: remove data type hardcoding from gptoss model implementation (…
nikhil-arm Sep 18, 2025
38db529
[feat]: Create interface for model-specific M-RoPE (#24194)
AzizCode92 Sep 18, 2025
75fb112
[Bug] Fix `returned_lse` not Defined issue (#25106)
yewentao256 Sep 18, 2025
d2a30a2
[Bug] Fix torch Compilation Cache Hit Error (#25093)
yewentao256 Sep 18, 2025
47f2a3b
review comments
LucasWilkinson Sep 18, 2025
10b9487
Merge remote-tracking branch 'origin/main' into lwilkinson/dbo-prefill
LucasWilkinson Sep 18, 2025
1c3dad2
[V0 Deprecation] Remove unused async_timeout.py (#25190)
WoosukKwon Sep 18, 2025
a53ad62
[KV offload][1b/N] rename offloading to kv_offload (#25191)
orozery Sep 18, 2025
670a8ce
fix deepgemm warmup
LucasWilkinson Sep 18, 2025
9fac6aa
[BugFix] Fix DeepGEMM warmup, no m.weight_scale_inv (#25206)
LucasWilkinson Sep 18, 2025
9a4600e
[CORE] Prompt Embeddings Support for v1 Engine (#24278)
qthequartermasterman Sep 19, 2025
9d1c50a
[KV offload][2/N] Introduce LRU-based CPU offloading management (#20075)
orozery Sep 19, 2025
6d8246a
[gpt-oss] Add ResponseReasoningPartAddedEvent, ResponseReasoningPartD…
qandrew Sep 19, 2025
1a0a04d
[Perf] Optimize memory peak during EAGLE model loading. (#24585)
candyzone Sep 19, 2025
31a8a2a
[Misc] Clean up MM profiling warnings (#25222)
ywang96 Sep 19, 2025
6c8a3c0
[Docs] Fix griffe warnings in vllm/multimodal (#25216)
windsonsea Sep 19, 2025
52d324d
better HT schedule
LucasWilkinson Sep 19, 2025
a6149aa
[OOT] Support sync_model_loading for OOT (#25126)
xuechendi Sep 19, 2025
486c559
[Build] Update Xgrammar to 0.1.24 to get a CVE fix (#25188)
russellb Sep 19, 2025
8c1d4ac
[CPU] Disable oneDNN linear on non-x86 platforms (#25166)
bigPYJ1151 Sep 19, 2025
825fdb1
[Bugfix][CPU] Add placeholder to avoid import errors when using fused…
bigPYJ1151 Sep 19, 2025
f2718d2
[Misc] Cleanup test conftest for deprecated encoder-decoder models (#…
Isotr0py Sep 19, 2025
a684c01
[bugfix] fix MHA for models like OpenGVLab/InternVL3_5-38B (#25146)
yma11 Sep 19, 2025
cea91a3
[Kernel][Performance] Add Triton kernel for Qwen3-VL interleaved MRoP…
Isotr0py Sep 19, 2025
1dfea5f
[Bugfix][Perf] Misc fixes for Qwen3 VL (#25238)
ywang96 Sep 19, 2025
058525b
Move `PoolerConfig` from `config/__init__.py` to `config/pooler.py` (…
hmellor Sep 19, 2025
a3d087a
[P/D][Nixl] Introduce `KVTransferMetrics` and aggregation strategy (#…
NickLucche Sep 19, 2025
5089fd7
[V0 Deprecation] Remove V0 logic from `get_input_embeddings` interfac…
DarkLight1337 Sep 19, 2025
838d711
[Qwen] Remove cuda hard-code in qwen3 next (#25243)
wxsIcey Sep 19, 2025
9a72580
fix assert
LucasWilkinson Sep 19, 2025
cf278ff
Update CODEOWNERS (#25269)
hmellor Sep 19, 2025
aed1687
Move `ModelConfig` from `config/__init__.py` to `config/model.py` (#2…
hmellor Sep 19, 2025
ce75e15
refactor(benchmarks): add type annotations to wait_for_endpoint param…
samzong Sep 19, 2025
7ac67ea
[KV offload][3/N] Add worker-side CPU support (#21448)
orozery Sep 19, 2025
6c117cf
[Frontend] Pass API server count to each process (#23717)
DarkLight1337 Sep 19, 2025
2821986
[Core] Modify the initialization parameters of the lora manager (#25249)
jeejeelee Sep 19, 2025
d90e212
Remove Redundant Assignment in Qwen3_VisionPatchMerger (#25224)
LJH-LBJ Sep 19, 2025
12aed7e
Encoder model support for the Transformers backend (#25174)
hmellor Sep 19, 2025
47fd08a
[CI/Build] fix test function_calling (#25072)
chaunceyjiang Sep 19, 2025
2506ce5
[Core][Prefix Hash] Fix prefix hash metrics sliding window maintainan…
Jialin Sep 19, 2025
138f0d1
[Docs] add __init__.py to vllm/model_executor/layers/quantization/com…
samzong Sep 19, 2025
b716ab9
[bugfix] fix structured outputs key missing issue from #24929 (#25195)
luccafong Sep 19, 2025
c59a0ec
[KV offload][4/N] Offloading KV connector (#22595)
orozery Sep 19, 2025
a2a5f79
Optimize triton unified attention performance for sliding window atte…
zixi-qi Sep 19, 2025
7852b82
[Bugfix] GPT OSS Attritbute error on H100 (#25228)
varun-sundar-rabindranath Sep 19, 2025
4bdf400
[Bugfix] Fix chunked a2_scales in modular kernels (#25264)
bnellnm Sep 19, 2025
e57fc15
Specify platform in `pip-compile` `pre-commit` hook so it runs on Mac…
hmellor Sep 19, 2025
48ecb44
[Perf] Use FlashInfer RoPE for RotaryEmbedding.forward_cuda when avai…
mgoin Sep 19, 2025
b1a63d1
[BugFix] Make FlashInferMetadataBuilder non-blocking (#25040)
nvjullin Sep 19, 2025
ddc9048
Fix: Correct FusedMoE layer reference in auto_round quantization (#24…
David-Wen2025 Sep 19, 2025
e69e0b8
[Frontend] Responses API messages out, just harmony for now (#24985)
alecsolder Sep 19, 2025
711e912
[Compile] Fix Compile Warning for Ignoring `MIN_BLOCK_PER_SM` (#25193)
yewentao256 Sep 19, 2025
431535b
Enable modelopt gemma3 nvfp4/fp8, make workflow more robust (#22771)
Edwardf0t1 Sep 19, 2025
ee7a66d
allow disable flashinfer prefill (#25276)
luccafong Sep 19, 2025
14c1432
[BugFix] Fix async scheduling CPU tensor race take 2 (#25279)
njhill Sep 19, 2025
3da17c2
[Bugfix] Remove VLLM_TEST_DYNAMO_FULLGRAPH_CAPTURE #2969 (#25090)
Lucaskabela Sep 20, 2025
a36c675
Don't skip special tokens with hermes-style tool calling (#25281)
maxdebayser Sep 20, 2025
c7e7136
test: Remove vestigial skip for prompt embeds tests after landing v1 …
qthequartermasterman Sep 20, 2025
b8a287a
[docs] Prompt Embedding feature support (#25288)
qthequartermasterman Sep 20, 2025
8945b00
[torch.compile] CUDAGraph Inductor partition integration (#24281)
BoyuanFeng Sep 20, 2025
a25ade5
[BugFix] Ensure appropriate guards in destructors (#25284)
njhill Sep 20, 2025
535d800
[Misc] Support more collective_rpc return types (#25294)
njhill Sep 20, 2025
c308501
Improve weight loading for encoder models in Transformers backend (#2…
hmellor Sep 20, 2025
3642909
[BUGFIX] GPTQ quantization compatibility for Qwen3 Next MOE models (A…
JartX Sep 20, 2025
b7f186b
[BugFix] Exclude self when checking for port collision (#25286)
njhill Sep 20, 2025
6c5f82e
[BUG FIX][NON-CUDA]quick fix to avoid call cudagraph_unsafe in attent…
xuechendi Sep 20, 2025
f91480b
[Bugfix] fix tool call arguments is empty (#25223)
chaunceyjiang Sep 20, 2025
c60e613
[Optimization] Avoid repeated model architecture conversion for pooli…
DarkLight1337 Sep 20, 2025
9607d5e
[Hybrid Allocator] Support full attention with different hidden size …
heheda12345 Sep 20, 2025
be874c0
[Bugfix] Fix Qwen3-VL-MoE weight loading for EP (#25300)
ywang96 Sep 20, 2025
3d9a1d2
[V1] Support `LLM.apply_model` (#18465)
DarkLight1337 Sep 20, 2025
e08a3a3
[CI Failure] Disable FlashInfer RoPE to unblock CI (#25299)
mgoin Sep 20, 2025
032d661
[Docs] Fix warnings in mkdocs build (continued) (#25042)
wwl2755 Sep 20, 2025
bf8b26c
Generate _ModelInfo properties file when loading to improve loading …
manoelmarques Sep 20, 2025
3c713a9
[Model] Cleanup InternViT's data parallel implementation (#25306)
Isotr0py Sep 20, 2025
d88918e
[Core] Enable sharded state loader for V1 engine and enhance test cov…
lirong-lirong Sep 20, 2025
bef180f
[V0 Deprecation] Enable the remaining multimodal tests in V1 (#25307)
DarkLight1337 Sep 20, 2025
68ebc6b
Merge remote-tracking branch 'origin/main' into lwilkinson/dbo-prefill
LucasWilkinson Sep 20, 2025
8dd1bb0
Merge remote-tracking branch 'origin/main' into lwilkinson/dbo-prefill
LucasWilkinson Sep 20, 2025
367a480
[Docs] Fix warnings in vllm/profiler and vllm/transformers_utils (#25…
windsonsea Sep 20, 2025
52c2a8d
[V0 Deprecation] Remove LLMEngine (#25033)
WoosukKwon Sep 21, 2025
86647d1
[V0 Deprecation] Remove V0 Output Processor (#25320)
WoosukKwon Sep 21, 2025
572ddf8
[Chore] Remove unused sampler in models (#25324)
WoosukKwon Sep 21, 2025
72dd159
[CI] Skip tests failing on main (#25326)
WoosukKwon Sep 21, 2025
c99db8c
[V0 Deprecation] Remove V0 core (#25321)
WoosukKwon Sep 21, 2025
62b38dc
[Doc] improve test-pipeline.yaml documentation (#25305)
hl475 Sep 21, 2025
1cd885b
[V0 Deprecation] Remove V0 model runner base & simplify worker base (…
WoosukKwon Sep 21, 2025
035fd2b
[Multi Modal][Performance] Fused Q,K's apply_rope in more models (#25…
wwl2755 Sep 21, 2025
12dbd83
[V0 Deprecation] Remove from_seq_group methods (#25330)
WoosukKwon Sep 21, 2025
7ed82d1
[V0 Deprecation] Remove V0 MP executor (#25329)
WoosukKwon Sep 21, 2025
cf56cf7
[V1] Add sliding window support to Flex Attention backend (#24089)
Isotr0py Sep 21, 2025
4781120
remove unrelated change
LucasWilkinson Sep 21, 2025
30d0891
[MM][Perf] Minor Optimization on Qwen3-VL `fast_pos_embed_interpolate…
ywang96 Sep 21, 2025
9aea737
[Bugfix] Typos in error message for missing model config file (#25339)
simondanielsson Sep 21, 2025
65a5910
[Optimization] Cache chat template result when processor fails to be …
DarkLight1337 Sep 21, 2025
26e673f
[V0 Deprecation] Remove V0 Sequence class & Sampler (#25332)
WoosukKwon Sep 21, 2025
0ff8ebb
[V0 Deprecation] Remove async_output_proc, preemption mode, delay fac…
WoosukKwon Sep 21, 2025
c438b29
feat: Enable engine-level arguments with speculators models (#25250)
rahul-tuli Sep 21, 2025
1c3ffdb
[V0 Deprecation] Remove V0 sampling metadata (#25345)
WoosukKwon Sep 21, 2025
af7dfb0
[Perf] Further optimization for Qwen3-VL `fast_pos_embed_interpolate`…
Isotr0py Sep 21, 2025
bc6e542
Remove V0 attention backends (#25351)
WoosukKwon Sep 21, 2025
04d3752
[Bugfix][V0 Deprecation][CI] use async mock and await for async metho…
KKSK-DON Sep 21, 2025
5aeb925
Multimodal - audio tests (#25285)
debroy-rh Sep 21, 2025
7b57a43
[Model] Support Dots OCR (#24645)
ywang96 Sep 22, 2025
793be8d
[Docs] GSM8K Accuracy Evaluation doc update (#25360)
david6666666 Sep 22, 2025
0eecb31
[Bugfix] Fix hermes tool parser handling of non-string argument types…
david6666666 Sep 22, 2025
6d0b827
[V0 Deprecation] Remove V0-only methods in multi-modal registry (#25362)
DarkLight1337 Sep 22, 2025
f92d952
[V0 Deprecation] Remove `MultiModalPlaceholderMap` (#25366)
DarkLight1337 Sep 22, 2025
21467f9
Enable Eagle3 speculative decoding for GPT-OSS model (#25246)
eldarkurtic Sep 22, 2025
a66d131
[TPU][Bugfix][CI] Fix broken tests/build dependency (#25255)
NickLucche Sep 22, 2025
4cf71cc
[TPU] Deprecate `xm.mark_step` in favor of ``torch_xla.sync` (#25254)
NickLucche Sep 22, 2025
b6f01bd
refactor: abstract graph mode support into platform interface (#25161)
yiz-liu Sep 22, 2025
417a164
[Misc] Remove unused encoder-decoder error strings (#25374)
DarkLight1337 Sep 22, 2025
64c824c
Make pickle import check fast (#25379)
hmellor Sep 22, 2025
3d2c56b
Make `mypy` behave like a proper pre-commit hook (#25313)
hmellor Sep 22, 2025
ac24388
[Kernel] MI-300X triton moe configs (#23445)
Sara-KS Sep 22, 2025
c10101a
[Bugfix] Fix several issues with p2p xPyD in GET type (#23993)
Csrayz Sep 22, 2025
99f6c06
Update vllm/model_executor/layers/fused_moe/deepep_ht_prepare_finaliz…
LucasWilkinson Sep 22, 2025
9416c69
review comments
LucasWilkinson Sep 22, 2025
175811e
[V1][Attention] Split triton_attn in triton-only and rocm specific ba…
bringlein Sep 22, 2025
06a4133
[EPLB] Reduce EPLB Inference Overhead (#24573)
abmfy Sep 22, 2025
cfbee3d
[CLI env var] Add VLLM_FLASH_ATTN_MAX_NUM_SPLITS_FOR_CUDA_GRAPH in en…
Daisy-Ma-coder Sep 22, 2025
1d7f95b
[Compiler] Disable Inductor standalone compile by default (#25391)
ElizaWszola Sep 22, 2025
239ef0c
[CI Failure] Fix fp8 kv cache on <SM90 (#25396)
mgoin Sep 22, 2025
922979b
[DP] support torchrun external launcher with Data Parallelism (#24899)
luccafong Sep 22, 2025
8d0ee5a
[misc] Remove RFC review hours reference (#25416)
simon-mo Sep 22, 2025
d5e0fca
[torch.compile] Cleanup compilation tests and custom passes, add debu…
ProExpertProg Sep 22, 2025
8db2939
[KV offload][5/N] Add `CPUOffloadingSpec` (#24251)
orozery Sep 22, 2025
f552d5e
[CI/Build] Skip Qwen3-VL initialization tests until models are actual…
DarkLight1337 Sep 22, 2025
8bed179
[TPU] update torch_xla dependency for PyPI compatibility (#25278)
jcyang43 Sep 22, 2025
45d7d85
[Frontend] Responses API MCP tools for built in tools and to pass thr…
alecsolder Sep 22, 2025
d588cd2
[Bugfix] fix custom op test (#25429)
ProExpertProg Sep 23, 2025
f31ff87
[Core] Drop overly aggressive whisper assertion (#25408)
russellb Sep 23, 2025
0901970
[Bugfix] Fix missing `clear_connector_metadata` (#25397)
NickLucche Sep 23, 2025
ac0048c
[BugFix] [DP/EP] Fix slow execution when BS <= DP (#25407)
MatthewBonanni Sep 23, 2025
0b7bed9
[Performance] Remove input pads in cutlass_mla and optimize v_proj ou…
alexm-redhat Sep 23, 2025
9949aa2
[Perf] Apply torch.compile for `per_block_cast_to_fp8` (#24611)
yewentao256 Sep 23, 2025
6fa78d8
[V0 deprecation] Remove platform v1 controling interface (#25410)
Isotr0py Sep 23, 2025
c625f90
[V0 deprecation] Remove `_set_default_args_v0` function (#25409)
Isotr0py Sep 23, 2025
4741239
[Bug] Fix Long Context OOM Issue (#25290)
yewentao256 Sep 23, 2025
4e9e2ce
Merge branch 'main' into lwilkinson/dbo-prefill
tlrmchlsmth Sep 23, 2025
356ddcb
fix pre-commit
LucasWilkinson Sep 23, 2025
fc97733
[feat] Support MRoPE + YaRN (#25384)
JJJYmmm Sep 23, 2025
f225ea7
[XPU] Fix `compile_size` is `None` case. (#25433)
jikunshang Sep 23, 2025
eea1783
[benchmarks]allow skip ready check for bench serve (#25420)
luccafong Sep 23, 2025
78237e4
[Bugfix] Remove contiguous output req for context parallel MLA (#25414)
mgoin Sep 23, 2025
fafbe11
[Docs] Fix griffe warnings in vllm/lora/ops (#25369)
windsonsea Sep 23, 2025
e8db44f
[DP/EP][GPTOSS] Use triton matmul-ogs kernels for GPTOSS DP/EP (#24588)
varun-sundar-rabindranath Sep 23, 2025
5774b0a
[NIXL][OOT platform] support nixl_connector with oot platform and oth…
xuechendi Sep 23, 2025
c98be0a
[Model] Enable DP for ViT in Qwen2-VL (#25445)
DarkLight1337 Sep 23, 2025
ba8d216
Handle triton kernel import exception (#25319)
minosfuture Sep 23, 2025
9383cd6
[Frontend] Add a new xml-based tool parser for qwen3-coder (#25028)
Zhikaiiii Sep 23, 2025
babad6e
[Misc] Move DP for ViT code inside model executor dir (#25459)
DarkLight1337 Sep 23, 2025
4322c55
[Test]: Hermes tool parser stream output error in Qwen3 case (#25203)
ahartel Sep 23, 2025
231c2c6
[Bugfix] Fix idefics3 `tie_word_embeddings` (#25454)
Isotr0py Sep 23, 2025
273690a
[Core] Optimize LoRA weight loading (#25403)
jeejeelee Sep 23, 2025
0d9fe26
[docs] Benchmark Serving Incorrect Arg (#25474)
vllmellm Sep 23, 2025
b6a136b
[CI/Build] Fix disabled v1 attention backend selection test (#25471)
Isotr0py Sep 23, 2025
4c631db
Merge branch 'main' into lwilkinson/dbo-prefill
tlrmchlsmth Sep 23, 2025
f0bc569
precommit fix
LucasWilkinson Sep 23, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
2 changes: 1 addition & 1 deletion .buildkite/nightly-benchmarks/nightly-descriptions.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ This benchmark aims to:

Latest results: [results link](https://blog.vllm.ai/2024/09/05/perf-update.html), scroll to the end.

Latest reproduction guilde: [github issue link](https://github.com/vllm-project/vllm/issues/8176)
Latest reproduction guide: [github issue link](https://github.com/vllm-project/vllm/issues/8176)

## Setup

Expand Down
16 changes: 4 additions & 12 deletions .buildkite/release-pipeline.yaml
Original file line number Diff line number Diff line change
@@ -1,24 +1,22 @@
steps:
# aarch64 + CUDA builds. PyTorch 2.8 aarch64 + CUDA wheel is only available on CUDA 12.9
- label: "Build arm64 wheel - CUDA 12.9"
depends_on: ~
id: build-wheel-arm64-cuda-12-9
agents:
queue: arm64_cpu_queue_postmerge
commands:
# #NOTE: torch_cuda_arch_list is derived from upstream PyTorch build files here:
# https://github.com/pytorch/pytorch/blob/main/.ci/aarch64_linux/aarch64_ci_build.sh#L7
- "DOCKER_BUILDKIT=1 docker build --build-arg max_jobs=16 --build-arg USE_SCCACHE=1 --build-arg GIT_REPO_CHECK=1 --build-arg CUDA_VERSION=12.9.1 --build-arg torch_cuda_arch_list='8.7 9.0 10.0+PTX 12.0' --tag vllm-ci:build-image --target build --progress plain -f docker/Dockerfile ."
- "DOCKER_BUILDKIT=1 docker build --build-arg max_jobs=16 --build-arg USE_SCCACHE=1 --build-arg GIT_REPO_CHECK=1 --build-arg CUDA_VERSION=12.9.1 --build-arg VLLM_MAIN_CUDA_VERSION=12.9 --build-arg torch_cuda_arch_list='8.7 9.0 10.0+PTX 12.0' --tag vllm-ci:build-image --target build --progress plain -f docker/Dockerfile ."
- "mkdir artifacts"
- "docker run --rm -v $(pwd)/artifacts:/artifacts_host vllm-ci:build-image bash -c 'cp -r dist /artifacts_host && chmod -R a+rw /artifacts_host'"
- "bash .buildkite/scripts/upload-wheels.sh"
env:
DOCKER_BUILDKIT: "1"

- block: "Build CUDA 12.8 wheel"
key: block-build-cu128-wheel

- label: "Build wheel - CUDA 12.8"
depends_on: block-build-cu128-wheel
depends_on: ~
id: build-wheel-cuda-12-8
agents:
queue: cpu_queue_postmerge
Expand All @@ -30,12 +28,8 @@ steps:
env:
DOCKER_BUILDKIT: "1"

- block: "Build CUDA 12.6 wheel"
key: block-build-cu126-wheel
depends_on: ~

- label: "Build wheel - CUDA 12.6"
depends_on: block-build-cu126-wheel
depends_on: ~
id: build-wheel-cuda-12-6
agents:
queue: cpu_queue_postmerge
Expand Down Expand Up @@ -102,8 +96,6 @@ steps:
depends_on:
- create-multi-arch-manifest
- build-wheel-cuda-12-8
- build-wheel-cuda-12-6
- build-wheel-cuda-12-9
id: annotate-release-workflow
agents:
queue: cpu_queue_postmerge
Expand Down
29 changes: 22 additions & 7 deletions .buildkite/scripts/annotate-release.sh
Original file line number Diff line number Diff line change
Expand Up @@ -14,18 +14,33 @@ buildkite-agent annotate --style 'info' --context 'release-workflow' << EOF
To download the wheel:
\`\`\`
aws s3 cp s3://vllm-wheels/${RELEASE_VERSION}/vllm-${RELEASE_VERSION}-cp38-abi3-manylinux1_x86_64.whl .
aws s3 cp s3://vllm-wheels/${RELEASE_VERSION}/vllm-${RELEASE_VERSION}-cp38-abi3-manylinux2014_aarch64.whl .
aws s3 cp s3://vllm-wheels/${RELEASE_VERSION}+cu126/vllm-${RELEASE_VERSION}+cu126-cp38-abi3-manylinux1_x86_64.whl .
aws s3 cp s3://vllm-wheels/${RELEASE_VERSION}+cu118/vllm-${RELEASE_VERSION}+cu118-cp38-abi3-manylinux1_x86_64.whl .
aws s3 cp s3://vllm-wheels/${RELEASE_VERSION}+cu129/vllm-${RELEASE_VERSION}+cu129-cp38-abi3-manylinux1_x86_64.whl .
\`\`\`
To download and upload the image:
\`\`\`
docker pull public.ecr.aws/q9t5s3a7/vllm-release-repo:${BUILDKITE_COMMIT}
docker tag public.ecr.aws/q9t5s3a7/vllm-release-repo:${BUILDKITE_COMMIT} vllm/vllm-openai
docker tag vllm/vllm-openai vllm/vllm-openai:latest
docker tag vllm/vllm-openai vllm/vllm-openai:v${RELEASE_VERSION}
docker push vllm/vllm-openai:latest
docker push vllm/vllm-openai:v${RELEASE_VERSION}
docker pull public.ecr.aws/q9t5s3a7/vllm-release-repo:${BUILDKITE_COMMIT}-x86_64
docker pull public.ecr.aws/q9t5s3a7/vllm-release-repo:${BUILDKITE_COMMIT}-aarch64
docker tag public.ecr.aws/q9t5s3a7/vllm-release-repo:${BUILDKITE_COMMIT}-x86_64 vllm/vllm-openai:x86_64
docker tag vllm/vllm-openai:x86_64 vllm/vllm-openai:latest-x86_64
docker tag vllm/vllm-openai:x86_64 vllm/vllm-openai:v${RELEASE_VERSION}-x86_64
docker push vllm/vllm-openai:latest-x86_64
docker push vllm/vllm-openai:v${RELEASE_VERSION}-x86_64
docker tag public.ecr.aws/q9t5s3a7/vllm-release-repo:${BUILDKITE_COMMIT}-aarch64 vllm/vllm-openai:aarch64
docker tag vllm/vllm-openai:aarch64 vllm/vllm-openai:latest-aarch64
docker tag vllm/vllm-openai:aarch64 vllm/vllm-openai:v${RELEASE_VERSION}-aarch64
docker push vllm/vllm-openai:latest-aarch64
docker push vllm/vllm-openai:v${RELEASE_VERSION}-aarch64
docker manifest create vllm/vllm-openai:latest vllm/vllm-openai:latest-x86_64 vllm/vllm-openai:latest-aarch64 --amend
docker manifest create vllm/vllm-openai:v${RELEASE_VERSION} vllm/vllm-openai:v${RELEASE_VERSION}-x86_64 vllm/vllm-openai:v${RELEASE_VERSION}-aarch64 --amend
docker manifest push vllm/vllm-openai:latest
docker manifest push vllm/vllm-openai:v${RELEASE_VERSION}
\`\`\`
EOF
10 changes: 0 additions & 10 deletions .buildkite/scripts/hardware_ci/run-amd-test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -86,10 +86,6 @@ if [[ $commands == *"pytest -v -s models/test_registry.py"* ]]; then
commands=${commands//"pytest -v -s models/test_registry.py"/"pytest -v -s models/test_registry.py -k 'not BambaForCausalLM and not GritLM and not Mamba2ForCausalLM and not Zamba2ForCausalLM'"}
fi

if [[ $commands == *"VLLM_USE_V1=0 pytest -v -s models/test_initialization.py -k 'not llama4 and not plamo2'"* ]]; then
commands=${commands//"VLLM_USE_V1=0 pytest -v -s models/test_initialization.py -k 'not llama4 and not plamo2'"/"VLLM_USE_V1=0 pytest -v -s models/test_initialization.py -k 'not llama4 and not plamo2 and not BambaForCausalLM and not Gemma2ForCausalLM and not Grok1ModelForCausalLM and not Zamba2ForCausalLM and not Gemma2Model and not GritLM'"}
fi

if [[ $commands == *"pytest -v -s compile/test_basic_correctness.py"* ]]; then
commands=${commands//"pytest -v -s compile/test_basic_correctness.py"/"VLLM_USE_TRITON_FLASH_ATTN=0 pytest -v -s compile/test_basic_correctness.py"}
fi
Expand Down Expand Up @@ -167,12 +163,6 @@ if [[ $commands == *" entrypoints/llm "* ]]; then
--ignore=entrypoints/llm/test_prompt_validation.py "}
fi

#Obsolete currently
##ignore certain Entrypoints/llm tests
#if [[ $commands == *" && pytest -v -s entrypoints/llm/test_guided_generate.py"* ]]; then
# commands=${commands//" && pytest -v -s entrypoints/llm/test_guided_generate.py"/" "}
#fi

# --ignore=entrypoints/openai/test_encoder_decoder.py \
# --ignore=entrypoints/openai/test_embedding.py \
# --ignore=entrypoints/openai/test_oot_registration.py
Expand Down
1 change: 0 additions & 1 deletion .buildkite/scripts/hardware_ci/run-cpu-test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,6 @@ function cpu_tests() {

pytest -x -v -s tests/models/language/pooling -m cpu_model
pytest -x -v -s tests/models/multimodal/generation \
--ignore=tests/models/multimodal/generation/test_mllama.py \
--ignore=tests/models/multimodal/generation/test_pixtral.py \
-m cpu_model"

Expand Down
2 changes: 1 addition & 1 deletion .buildkite/scripts/hardware_ci/run-tpu-v1-test-part2.sh
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ echo "--- Installing Python dependencies ---"
python3 -m pip install --progress-bar off git+https://github.com/thuml/depyf.git \
&& python3 -m pip install --progress-bar off pytest pytest-asyncio tpu-info \
&& python3 -m pip install --progress-bar off "lm-eval @ git+https://github.com/EleutherAI/lm-evaluation-harness.git@206b7722158f58c35b7ffcd53b035fdbdda5126d" \
&& python3 -m pip install --progress-bar off hf-transfer
&& python3 -m pip install --progress-bar off hf-transfer tblib==3.1.0
echo "--- Python dependencies installed ---"
export VLLM_USE_V1=1
export VLLM_XLA_CHECK_RECOMPILATION=1
Expand Down
2 changes: 1 addition & 1 deletion .buildkite/scripts/hardware_ci/run-tpu-v1-test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ echo "--- Installing Python dependencies ---"
python3 -m pip install --progress-bar off git+https://github.com/thuml/depyf.git \
&& python3 -m pip install --progress-bar off pytest pytest-asyncio tpu-info \
&& python3 -m pip install --progress-bar off "lm-eval @ git+https://github.com/EleutherAI/lm-evaluation-harness.git@206b7722158f58c35b7ffcd53b035fdbdda5126d" \
&& python3 -m pip install --progress-bar off hf-transfer
&& python3 -m pip install --progress-bar off hf-transfer tblib==3.1.0
echo "--- Python dependencies installed ---"
export VLLM_USE_V1=1
export VLLM_XLA_CHECK_RECOMPILATION=1
Expand Down
Loading