Skip to content

Commit 84f3771

Browse files
authored
PV release 25.32.1.2 (#27)
1 parent 79b0db6 commit 84f3771

File tree

2 files changed

+183
-203
lines changed

2 files changed

+183
-203
lines changed

vllm/README.md

Lines changed: 81 additions & 82 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,17 @@
11
# llm-scaler-vllm
22

3-
llm-scaler-vllm is an extended and optimized version of vLLM, specifically adapted for Intel’s Multi-BMG platform. This project enhances vLLM’s core architecture with Intel-specific performance optimizations, advanced features, and tailored support for customer use cases.
3+
llm-scaler-vllm is an extended and optimized version of vLLM, specifically adapted for Intel’s Multi GPU platform. This project enhances vLLM’s core architecture with Intel-specific performance optimizations, advanced features, and tailored support for customer use cases.
44

55
---
66

77
## Table of Contents
88

99
1. [Getting Started and Usage](#1-getting-started-and-usage)
1010
1.1 [Install Native Environment](#11-install-native-environment)
11-
1.2 [Pulling and Running the Docker Container](#11-pulling-and-running-the-docker-container)
12-
1.3 [Launching the Serving Service](#12-launching-the-serving-service)
13-
1.4 [Benchmarking the Service](#13-benchmarking-the-service)
11+
1.2 [Pulling and Running the Platform Evaluation Docker Container](#12-pulling-and-running-the-platform-evaluation-docker-container)
12+
1.3 [Pulling and Running the vllm Docker Container](#13-pulling-and-running-the-vllm-docker-container)
13+
1.4 [Launching the Serving Service](#14-launching-the-serving-service)
14+
1.5 [Benchmarking the Service](#15-benchmarking-the-service)
1415
2. [Advanced Features](#2-advanced-features)
1516
2.1 [CCL Support (both P2P & USM)](#21-ccl-support-both-p2p--usm)
1617
2.2 [INT4 and FP8 Quantized Online Serving](#22-int4-and-fp8-quantized-online-serving)
@@ -27,9 +28,18 @@ llm-scaler-vllm is an extended and optimized version of vLLM, specifically adapt
2728

2829
## 1. Getting Started and Usage
2930

30-
### 1.1 Install Native Environment
31+
We provide three offerings to setup the environment and run evaluation:
32+
33+
- Bare Mental BKC Installation Script
34+
Linux kernel, GPU firmware and docker library setup
35+
36+
- Platform Evaluation Docker Image
37+
GEMM/GPU Memory Bandwidth/P2P/1CCL benchmark
38+
39+
- vllm Inference Docker Image
40+
LLM inference evaluation
3141

32-
#### 1.1.1 Execute the Script
42+
### 1.1 Install Native Environment
3343

3444
First, install a standard Ubuntu 25.04
3545
- [Ubuntu 25.04 Desktop](https://releases.ubuntu.com/25.04/ubuntu-25.04-desktop-amd64.iso) (for Xeon-W)
@@ -42,7 +52,8 @@ export https_proxy=http://your-proxy.com:port
4252
export http_proxy=http://your-proxy.com:port
4353
export no_proxy=127.0.0.1,*.intel.com
4454
````
45-
Make sure your system has internet access and also the proxy can access [Ubuntu intel-graphics PPA](https://launchpad.net/~kobuk-team/+archive/ubuntu/intel-graphics) since native_bkc_setup.sh will get packages from there.
55+
56+
Make sure your system has internet access since we need to update the Linux kernel, GPU firmware and install base docker environments.
4657

4758
Switch to root user, run this script.
4859

@@ -53,98 +64,86 @@ chmod +x native_bkc_setup.sh
5364
./native_bkc_setup.sh
5465
````
5566
56-
If everything is ok, you can see below installation completion message. Depending on your network speed, the execution may require 30 mins or longer time.
67+
If everything is ok, you can see below installation completion message. Depending on your network speed, the execution may require 5 mins or longer time.
5768
5869
```bash
59-
Tools and scripts are located at /root/multi-arc.
6070
✅ [DONE] Environment setup complete. Please reboot your system to apply changes.
6171
````
6272

63-
#### 1.1.2 Check the Installation
73+
### 1.2 Pulling and Running the Platform Evaluation Docker Container
6474

65-
Since it update the GPU firmware and initramfs, you need to reboot to make the changes taking effect.
75+
Platform docker image targets for GPU memory bandwidth, GEMM, P2P, collective communication benchmark.
6676

67-
After reboot, you can use lspci and sycl-ls to check if all the drivers are installed correctly.
68-
69-
For Intel B60 GPU, it's device ID is e211, you can grep this ID in lspci to make sure the KMD (kernel mode driver) workable.
77+
To pull the image,
7078

7179
```bash
72-
root@edgeaihost19:~# lspci -tv | grep -i e211
73-
| \-01.0-[17-1a]----00.0-[18-1a]--+-01.0-[19]----00.0 Intel Corporation Device e211
74-
| \-05.0-[4e-51]----00.0-[4f-51]--+-01.0-[50]----00.0 Intel Corporation Device e211
75-
| \-01.0-[85-88]----00.0-[86-88]--+-01.0-[87]----00.0 Intel Corporation Device e211
76-
| \-01.0-[bc-bf]----00.0-[bd-bf]--+-01.0-[be]----00.0 Intel Corporation Device e211
80+
docker pull intel/llm-scaler-platform:latest
7781
````
7882
79-
If you also see e211 device recognized by sycl-ls, then GPGPU UMD (user mode driver) working properly.
80-
81-
```bash
82-
root@edgeaihost19:~# source /opt/intel/oneapi/setvars.sh
83-
84-
:: initializing oneAPI environment ...
85-
-bash: BASH_VERSION = 5.2.37(1)-release
86-
:: advisor -- latest
87-
:: ccl -- latest
88-
:: compiler -- latest
89-
:: dal -- latest
90-
:: debugger -- latest
91-
:: dev-utilities -- latest
92-
:: dnnl -- latest
93-
:: dpcpp-ct -- latest
94-
:: dpl -- latest
95-
:: ipp -- latest
96-
:: ippcp -- latest
97-
:: mkl -- latest
98-
:: mpi -- latest
99-
:: pti -- latest
100-
:: tbb -- latest
101-
:: umf -- latest
102-
:: vtune -- latest
103-
:: oneAPI environment initialized ::
104-
105-
root@edgeaihost19:~# sycl-ls
106-
[level_zero:gpu][level_zero:0] Intel(R) oneAPI Unified Runtime over Level-Zero, Intel(R) Graphics [0xe211] 20.1.0 [1.6.33944+12]
107-
[level_zero:gpu][level_zero:1] Intel(R) oneAPI Unified Runtime over Level-Zero, Intel(R) Graphics [0xe211] 20.1.0 [1.6.33944+12]
108-
[level_zero:gpu][level_zero:2] Intel(R) oneAPI Unified Runtime over Level-Zero, Intel(R) Graphics [0xe211] 20.1.0 [1.6.33944+12]
109-
[level_zero:gpu][level_zero:3] Intel(R) oneAPI Unified Runtime over Level-Zero, Intel(R) Graphics [0xe211] 20.1.0 [1.6.33944+12]
110-
[opencl:cpu][opencl:0] Intel(R) OpenCL, Intel(R) Xeon(R) w5-2545 OpenCL 3.0 (Build 0) [2025.20.6.0.04_224945]
111-
[opencl:gpu][opencl:1] Intel(R) OpenCL Graphics, Intel(R) Graphics [0xe211] OpenCL 3.0 NEO [25.22.33944]
112-
[opencl:gpu][opencl:2] Intel(R) OpenCL Graphics, Intel(R) Graphics [0xe211] OpenCL 3.0 NEO [25.22.33944]
113-
[opencl:gpu][opencl:3] Intel(R) OpenCL Graphics, Intel(R) Graphics [0xe211] OpenCL 3.0 NEO [25.22.33944]
114-
[opencl:gpu][opencl:4] Intel(R) OpenCL Graphics, Intel(R) Graphics [0xe211] OpenCL 3.0 NEO [25.22.33944]
83+
Then, run the container with below script. IMAGE_NAME is the container name you just pull.
84+
HOST_DIR is the directory you want to mount into the docker.
85+
86+
```bash
87+
IMAGE_NAME="intel/llm-scaler-platform:latest"
88+
HOST_DIR="$2"
89+
90+
# Verify directory exists
91+
if [ ! -d "$HOST_DIR" ]; then
92+
echo "Error: Directory '$HOST_DIR' does not exist."
93+
exit 2
94+
fi
95+
96+
# Run the container
97+
docker run -it \
98+
--device=/dev/dri \
99+
--group-add video \
100+
--cap-add=SYS_ADMIN \
101+
--mount type=bind,source=/dev/dri/by-path,target=/dev/dri/by-path \
102+
--mount type=bind,source=/sys,target=/sys \
103+
--mount type=bind,source=/dev/bus,target=/dev/bus \
104+
--mount type=bind,source=/dev/char,target=/dev/char \
105+
--mount type=bind,source="$(realpath "$HOST_DIR")",target=/mnt/workdir \
106+
"$IMAGE_NAME" \
107+
bash
115108
````
116109

117-
#### 1.1.3 Using Tools
118-
119-
This script installs 2 tools for GPU development:
120-
121-
- level-zero-tests: GPU P2P & memory bandwidth benchmark
122-
- xpu-smi: GPU profiling and management
123-
124-
level-zero-tests is located /root/multi-arc/level-zero-tests. It's already built, you can directly run the binary in
125-
- level-zero-tests/build/perf_tests/ze_peer/ze_peer for P2P benchmark
126-
- level-zero-tests/build/perf_tests/ze_peak/ze_peak for memory bandwidth benchmark including H2D, D2H, D2D
110+
Once you enter the docker container, go to /opt/intel/multi-arc directory, collaterals/tools/scripts put there.
111+
112+
```bash
113+
root@da47dbf0a2f4:/opt/intel/multi-arc/multi-arc-bmg-offline-installer-25.32.1.2# ls -l
114+
total 68
115+
-rw-r--r-- 1 root root 695 Aug 7 03:21 01_RELEASE_NTOES.md
116+
-rw-r--r-- 1 root root 8183 Aug 7 02:11 02_README.md
117+
-rw-r--r-- 1 root root 1164 Aug 6 01:16 03_KNOWN_ISSUES.md
118+
-rw-r--r-- 1 root root 2371 Aug 7 04:40 04_FAQ.md
119+
drwxr-xr-x 3 root root 4096 Aug 3 15:17 base
120+
drwxr-xr-x 2 root root 4096 Jul 29 23:31 firmware
121+
drwxr-xr-x 6 root root 4096 Aug 3 23:15 gfxdrv
122+
-rwxr-xr-x 1 root root 5993 Aug 6 15:44 installer.sh
123+
-rw-r--r-- 1 root root 9200 Aug 7 04:55 install_log_20250807_045158.log
124+
drwxr-xr-x 2 root root 4096 Aug 3 15:12 oneapi
125+
drwxr-xr-x 3 root root 4096 Aug 7 02:05 results
126+
drwxr-xr-x 6 root root 4096 Aug 3 23:42 scripts
127+
drwxr-xr-x 6 root root 4096 Aug 3 15:17 tools
128+
````
127129
128-
xpu-smi already installed in system, you can directly run.
130+
Please read the 02_README.md firstly to understand all of our offerings. Then your may use scripts/evaluation/platform_basic_evaluation.sh
131+
to perform a quick evaluation with report under results. We also provide a reference report under results/
129132
130133
```bash
131-
xpu-smi
134+
root@da47dbf0a2f4:/opt/intel/multi-arc/multi-arc-bmg-offline-installer-25.32.1.2/results/20250807_100553# ls -la
135+
total 48
136+
drwxr-xr-x 2 root root 4096 Aug 7 02:10 .
137+
drwxr-xr-x 3 root root 4096 Aug 7 02:05 ..
138+
-rw-r--r-- 1 root root 1967 Aug 7 02:08 allgather_outplace_128M.csv
139+
-rw-r--r-- 1 root root 2034 Aug 7 02:08 allreduce_outplace_128M.csv
140+
-rw-r--r-- 1 root root 1960 Aug 7 02:08 alltoall_outplace_128M.csv
141+
-rw-r--r-- 1 root root 26541 Aug 7 02:08 test_log.txt
132142
````
133143

134-
Intel also offers 2 other tools which are not publicly available for current Pre-PV release.
135-
- 1ccl tool for collective communication benchmark
136-
- gemm tool for compute capability benchmark
137-
138-
To get these 2 tools or detailed user guide for all tools, please contact your Intel support team for help.
139-
140-
We also provide a script to set the CPU/GPU to performance mode, you can run it before running the workload
141-
142-
```bash
143-
cd /root/multi-arc
144-
./setup_perf.sh
145-
````
144+
You can also check 03_KNOWN_ISSUE.md and 04_FAQ.md for more details.
146145

147-
### 1.2 Pulling and Running the Docker Container
146+
### 1.3 Pulling and Running the vllm Docker Container
148147

149148
First, pull the image:
150149

@@ -177,7 +176,7 @@ docker exec -it lsv-container bash
177176

178177
---
179178

180-
### 1.3 Launching the Serving Service
179+
### 1.4 Launching the Serving Service
181180

182181
```bash
183182
TORCH_LLM_ALLREDUCE=1 \
@@ -206,7 +205,7 @@ python3 -m vllm.entrypoints.openai.api_server \
206205

207206
---
208207

209-
### 1.4 Benchmarking the Service
208+
### 1.5 Benchmarking the Service
210209

211210
```bash
212211
python3 /llm/vllm/benchmarks/benchmark_serving.py \

0 commit comments

Comments
 (0)