Releases · intel/llm-scaler · GitHub

21 Oct 01:31

liu-shaojun

llm-scaler-omni beta release 0.1.0-b2 Pre-release

Pre-release

Highlights

Resources

Docker Image: intel/llm-scaler-omni:0.1.0-b2

What’s new

omni:
- Fix issues
  - Fix ComfyUI interpolate issue.
  - Fix Xinference XPU index selection issue.
- Support more workflows
  - ComfyUI
    - Wan2.2-Animate-14B basic workflow
    - Qwen-Image-Edit 2509 workflow
    - VoxCPM workflow
  - Xinference
    - Kokoro-82M-v1.1-zh

Assets 2

29 Sep 07:53

liu-shaojun

vllm-1.1-preview

llm-scaler-vllm beta release 1.1-preview Pre-release

Pre-release

Highlights

Resources

Docker Image: intel/llm-scaler-vllm:1.1-preview
(functionally equivalent to intel/llm-scaler-vllm:0.10.0-b2)

What’s new

vLLM:
- Bug fix for sym_int4 online quantization on Multi-modal models

Assets 2

29 Sep 05:22

liu-shaojun

llm-scaler-omni beta release 0.1.0-b1 Pre-release

Pre-release

Highlights

Resources

Docker Image: intel/llm-scaler-omni:0.1.0-b1

What’s new

omni:
- Integrated ComfyUI on XPU and provide sample workflows for:
  - Wan2.2 TI2V 5B
  - Wan2.2 T2V 14B (multi-XPU supported)
  - FLUX.1 dev
  - FLUX.1 Kontext dev
  - Stable Diffusion 3.5 large
  - Qwen Image, Qwen Image Edit, etc
- Added support for xDit, Yunchang, and Raylight usages on XPU.
- Integrated Xinference with OpenAI-compatible APIs to provide:
  - TTS: kokoro 82M
  - STT: Whisper Large v3
  - T2I: Stable Diffusion 3.5 Medium

Assets 2

23 Sep 08:00

liu-shaojun

llm-scaler-vllm beta release 0.10.0-b4 Pre-release

Pre-release

Highlights

Resources

Docker Image: intel/llm-scaler-vllm:0.10.0-b4

What’s new

vLLM:
- Resolve 72-hour crash caused by OneCCL

Assets 2

23 Sep 07:06

liu-shaojun

llm-scaler-vllm beta release 0.10.0-b3 Pre-release

Pre-release

Highlights

Resources

Docker Image: intel/llm-scaler-vllm:0.10.0-b3

What’s new

vLLM:
- Support Seed-oss model
- Adding miner-U
- Enable MiniCPM-V-4_5
- Fix internvl_3_5 and deepseek-v2-lite error

Assets 2

21 Aug 15:02

liu-shaojun

llm-scaler-vllm beta release 0.10.0-b2 Pre-release

Pre-release

Highlights

Resources

Docker Image: intel/llm-scaler-vllm:0.10.0-b2

What’s new

vLLM:
- Bug fixe for sym_int4 online quantization on Multi-modal models

Assets 2

05 Sep 01:50

liu-shaojun

llm-scaler-vllm beta release 0.10.0-b1 Pre-release

Pre-release

Highlights

Resources

Docker Image: intel/llm-scaler-vllm:0.10.0-b1

What’s new

vLLM:
- Upgrade vLLM to 0.10.0 version
- Supports async scheduling with option --async-scheduling
- Changing the support of embedding/reranker models to V1 engine
- Supporting pipeline parallsim with mp/ray backend
- Enable internvl3-8b model
- Enable MiniCPM-v-4 model
- Enable InternVL3_5-8B

Assets 2

05 Sep 01:42

liu-shaojun

llm-scaler-vllm beta release 0.9.0-b3 Pre-release

Pre-release

Highlights

Resources

Docker Image: intel/llm-scaler-vllm:0.9.0-b3

What’s new

vLLM:
- Enable whisper model
- Enable GLM-4.5-Air
- Optimize vLLM memory usage by updating profile_run logic
- Enable/Optimize pipeline parallelism with Ray backend
- Enable GLM-4.1V-9B-Thinking for image input
- Enable model dots.ocr

Assets 2

09 Aug 03:14

liu-shaojun

llm-scaler-vllm PV release 1.0 Latest

Latest

Highlights

Resources

Docker Image: intel/llm-scaler-vllm:1.0
Docker Image: intel/llm-scaler-platform:1.0

What’s new

vLLM:
- Performance optimization of TPOP for long input length (>4K): up to 1.8x perf for 40K seq length on 32B KPI model, and 4.2x perf for 40K seq length on 70B KPI model.
- Performance optimizations with ~10% output throughput improvement for 8B-32B KPI models compared to last drop.
- New feature: By-layer online quantization to reduce the required GPU memory
- New feature: PP (pipeline parallelism) support in vLLM (experimental)
- New feature: torch.compile (experimental)
- New feature: speculative decoding (experimental)
- Support for embedding, rerank model
- Enhanced multi-modal model support
- Performance improvements
- Maximum length auto-detecting
- Data parallelism support
- Bug fixes
OneCCL:
- OneCCL benchmark tool enablement
XPU Manager:
- GPU Power
- GPU Firmware update
- GPU Diagnostic
- GPU Memory Bandwidth
BKC:
- Implemented an offline installer to ensure a consistent environment and eliminate slow download speeds from the global Ubuntu PPA repository

Assets 2

25 Jul 07:05

liu-shaojun

llm-scaler-vllm beta release 0.2.0-b2 Pre-release

Pre-release

Highlights

Resources

Docker Image: intel/llm-scaler-vllm:0.2.0-b2

What’s new

llm-scaler-vllm: Developed a customized downstream version of vLLM with the following key features:
- int4/fp8 online quantization
- Support for embedding, rerank model
- Enhanced multi-modal model support
- Performance improvements
- Maximum length auto-detecting
- Data parallelism support
- Fixed performance degradation issue
- Fixed multi-modal OOM issue
- Fixed MiniCPM wrong output issue

Assets 2