-
-
Notifications
You must be signed in to change notification settings - Fork 11.7k
Description
-
[AsyncScheduling] Make async overlap work with logprobs #27615
-
[BugFix] Handle unscheduled requests properly when async scheduling #27756
-
[Core] Async scheduling + structured outputs compatibility #26866
-
[BugFix] Fix mixed penalties batch with async scheduling #27910
-
[AsyncScheduling] Don't schedule past request max_tokens #27922
-
[KV offload] Offloading connector async scheduling support #27648
-
[PerfFix] Avoid separate thread for MP executor shm spin #28012 - perf fix for regression in
#26866 -
[BugFix] Fix multi-modal async scheduling race condition #28706
-
[BugFix] Fix async scheduling + chunked prefill + preemption #28787
-
[Core] Async Scheduling X Spec Decoding Compatibility #24799
-
[BugFix] Fix duplicate req id tool-call race condition #29355
-
[BugFix] Use unique ids for different transcription prompts #29372
-
Address general duplicate request ids issue
- under discussion in [Core] Add a random suffix to frontend-provided request IDs #27987
-
Follow-on async scheduling + spec decode hardening/compatibility issues (in parallel)
-
Explore Async Scheduling + Pipeline Parallel