Skip to content

Releases: intel/llm-scaler

llm-scaler-omni beta release 0.1.0-b2

21 Oct 01:31
690807f

Choose a tag to compare

Pre-release

Highlights

Resources

What’s new

  • omni:

    • Fix issues
      • Fix ComfyUI interpolate issue.
      • Fix Xinference XPU index selection issue.
    • Support more workflows
      • ComfyUI
        • Wan2.2-Animate-14B basic workflow
        • Qwen-Image-Edit 2509 workflow
        • VoxCPM workflow
      • Xinference
        • Kokoro-82M-v1.1-zh

llm-scaler-vllm beta release 1.1-preview

29 Sep 07:53
1006351

Choose a tag to compare

Highlights

Resources

What’s new

  • vLLM:
    • Bug fix for sym_int4 online quantization on Multi-modal models

llm-scaler-omni beta release 0.1.0-b1

29 Sep 05:22
1006351

Choose a tag to compare

Pre-release

Highlights

Resources

What’s new

  • omni:

    • Integrated ComfyUI on XPU and provide sample workflows for:
      • Wan2.2 TI2V 5B
      • Wan2.2 T2V 14B (multi-XPU supported)
      • FLUX.1 dev
      • FLUX.1 Kontext dev
      • Stable Diffusion 3.5 large
      • Qwen Image, Qwen Image Edit, etc
    • Added support for xDit, Yunchang, and Raylight usages on XPU.
    • Integrated Xinference with OpenAI-compatible APIs to provide:
      • TTS: kokoro 82M
      • STT: Whisper Large v3
      • T2I: Stable Diffusion 3.5 Medium

llm-scaler-vllm beta release 0.10.0-b4

23 Sep 08:00
2548330

Choose a tag to compare

Pre-release

Highlights

Resources

What’s new

  • vLLM:

    • Resolve 72-hour crash caused by OneCCL

llm-scaler-vllm beta release 0.10.0-b3

23 Sep 07:06
927de0e

Choose a tag to compare

Pre-release

Highlights

Resources

What’s new

  • vLLM:

    • Support Seed-oss model
    • Adding miner-U
    • Enable MiniCPM-V-4_5
    • Fix internvl_3_5 and deepseek-v2-lite error

llm-scaler-vllm beta release 0.10.0-b2

21 Aug 15:02

Choose a tag to compare

Pre-release

Highlights

Resources

What’s new

  • vLLM:

    • Bug fixe for sym_int4 online quantization on Multi-modal models

llm-scaler-vllm beta release 0.10.0-b1

05 Sep 01:50
c04b5f5

Choose a tag to compare

Pre-release

Highlights

Resources

What’s new

  • vLLM:

    • Upgrade vLLM to 0.10.0 version
    • Supports async scheduling with option --async-scheduling
    • Changing the support of embedding/reranker models to V1 engine
    • Supporting pipeline parallsim with mp/ray backend
    • Enable internvl3-8b model
    • Enable MiniCPM-v-4 model
    • Enable InternVL3_5-8B

llm-scaler-vllm beta release 0.9.0-b3

05 Sep 01:42
8249baf

Choose a tag to compare

Pre-release

Highlights

Resources

What’s new

  • vLLM:

    • Enable whisper model
    • Enable GLM-4.5-Air
    • Optimize vLLM memory usage by updating profile_run logic
    • Enable/Optimize pipeline parallelism with Ray backend
    • Enable GLM-4.1V-9B-Thinking for image input
    • Enable model dots.ocr

llm-scaler-vllm PV release 1.0

09 Aug 03:14
84f3771

Choose a tag to compare

Highlights

Resources

What’s new

  • vLLM:

    • Performance optimization of TPOP for long input length (>4K): up to 1.8x perf for 40K seq length on 32B KPI model, and 4.2x perf for 40K seq length on 70B KPI model.
    • Performance optimizations with ~10% output throughput improvement for 8B-32B KPI models compared to last drop.
    • New feature: By-layer online quantization to reduce the required GPU memory
    • New feature: PP (pipeline parallelism) support in vLLM (experimental)
    • New feature: torch.compile (experimental)
    • New feature: speculative decoding (experimental)
    • Support for embedding, rerank model
    • Enhanced multi-modal model support
    • Performance improvements
    • Maximum length auto-detecting
    • Data parallelism support
    • Bug fixes
  • OneCCL:

    • OneCCL benchmark tool enablement
  • XPU Manager:

    • GPU Power
    • GPU Firmware update
    • GPU Diagnostic
    • GPU Memory Bandwidth
  • BKC:

    • Implemented an offline installer to ensure a consistent environment and eliminate slow download speeds from the global Ubuntu PPA repository

llm-scaler-vllm beta release 0.2.0-b2

25 Jul 07:05
5a58f3f

Choose a tag to compare

Pre-release

Highlights

Resources

What’s new

  • llm-scaler-vllm: Developed a customized downstream version of vLLM with the following key features:
    • int4/fp8 online quantization
    • Support for embedding, rerank model
    • Enhanced multi-modal model support
    • Performance improvements
    • Maximum length auto-detecting
    • Data parallelism support
    • Fixed performance degradation issue
    • Fixed multi-modal OOM issue
    • Fixed MiniCPM wrong output issue