Skip to content

Commit 16123ec

Browse files
authored
llm-scaler-omni: Integrate Xinference and Re-structure Omni Docs (#81)
* refine * rename * fix * refine structure * refine * fix * refine * refine * refine docs * update omni * reifine * refine * refine
1 parent 53da0b7 commit 16123ec

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

45 files changed

+3018
-308
lines changed

omni/README.md

Lines changed: 195 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,195 @@
1+
# llm-scaler-omni
2+
3+
---
4+
5+
## Table of Contents
6+
7+
1. [Getting Started with Omni Docker Image](#getting-started-with-omni-docker-image)
8+
2. [ComfyUI](#comfyui)
9+
3. [XInference](#xinference)
10+
4. [Stand-alone Examples](#stand-alone-examples)
11+
12+
---
13+
14+
## Getting Started with Omni Docker Image
15+
16+
Build docker image:
17+
18+
```bash
19+
bash build.sh
20+
```
21+
22+
Run docker image:
23+
24+
```bash
25+
export DOCKER_IMAGE=intel/llm-scaler-omni:0.1-b1
26+
export CONTAINER_NAME=comfyui
27+
export MODEL_DIR=<your_model_dir>
28+
export COMFYUI_MODEL_DIR=<your_comfyui_model_dir>
29+
sudo docker run -itd \
30+
--privileged \
31+
--net=host \
32+
--device=/dev/dri \
33+
-e no_proxy=localhost,127.0.0.1 \
34+
--name=$CONTAINER_NAME \
35+
-v $MODEL_DIR:/llm/models/ \
36+
-v $COMFYUI_MODEL_DIR:/llm/ComfyUI/models \
37+
--shm-size="64g" \
38+
--entrypoint=/bin/bash \
39+
$DOCKER_IMAGE
40+
41+
docker exec -it comfyui bash
42+
```
43+
44+
## ComfyUI:
45+
```bash
46+
cd /llm/ComfyUI
47+
48+
MODEL_PATH=<your_comfyui_models_path>
49+
rm -rf /llm/ComfyUI/models
50+
ln -s $MODEL_PATH /llm/ComfyUI/models
51+
echo "Symbolic link created from $MODEL_PATH to /llm/ComfyUI/models"
52+
53+
export http_proxy=<your_proxy>
54+
export https_proxy=<your_proxy>
55+
export no_proxy=localhost,127.0.0.1
56+
57+
python3 main.py
58+
```
59+
60+
Then you can access the webUI at `http://<your_local_ip>:8188/`. On the left side,
61+
62+
![workflow image](./assets/confyui_workflow.png)
63+
64+
### ComfyUI workflows
65+
66+
Currently, the following workflows are supported on B60:
67+
- Qwen-Image (refer to https://raw.githubusercontent.com/Comfy-Org/example_workflows/main/image/qwen/image_qwen_image_distill.json)
68+
- Qwen-Image-Edit (refer to https://raw.githubusercontent.com/Comfy-Org/workflow_templates/refs/heads/main/templates/image_qwen_image_edit.json)
69+
- Wan2.2-TI2V-5B (refer to https://raw.githubusercontent.com/Comfy-Org/workflow_templates/refs/heads/main/templates/video_wan2_2_5B_ti2v.json)
70+
- Wan2.2-T2V-14B with raylight (refer to https://github.com/komikndr/raylight/blob/main/example_workflows/WanT2V_Raylight.json)
71+
- Flux.1 Kontext Dev(Basic) workflow in ComfyUI examples (refer to https://docs.comfy.org/tutorials/flux/flux-1-kontext-dev)
72+
- SD3.5 Simple in ComfyUI examples (refer to https://comfyanonymous.github.io/ComfyUI_examples/sd3/)
73+
74+
#### Qwen-Image
75+
76+
ComfyUI tutorial for qwen-image: https://docs.comfy.org/tutorials/image/qwen/qwen-image
77+
78+
Only `Qwen-Image Native Workflow Example` part is validated and there are some issues using LoRA. It's recommended to run the Distilled version for better performance.
79+
80+
#### Qwen-Image-Edit
81+
82+
ComfyUI tutorial for qwen-image-edit: https://docs.comfy.org/tutorials/image/qwen/qwen-image-edit
83+
84+
#### Wan2.2-TI2V-5B
85+
86+
ComfyUI tutorial for wan2.2: https://docs.comfy.org/tutorials/video/wan/wan2_2
87+
88+
Due to memory limit with single device, only `
89+
Wan2.2 TI2V 5B Hybrid Version Workflow Example` is validated.
90+
91+
#### Wan2.2-T2V-14B with raylight
92+
93+
Currently using [WAN2.2-14B-Rapid-AllInOne](https://huggingface.co/Phr00t/WAN2.2-14B-Rapid-AllInOne) and [raylight](https://github.com/komikndr/raylight) as a faster solution with multi-XPU support. The model weights can get from [here](https://modelscope.cn/models/Phr00t/WAN2.2-14B-Rapid-AllInOne/files), and you may need to extract the unet part and VAE part seperately with `tools/extract.py`.
94+
95+
![wan_raylight](./assets/wan_raylight.png)
96+
97+
##### Follow the Steps to Complete the Workflow
98+
99+
1. Model Loading
100+
101+
- Ensure the `Load Diffusion Model (Ray)` node loads the diffusion model part from WAN2.2-14B-Rapid-AllInOne.
102+
- Ensure the `Load VAE` node loads the VAE part from WAN2.2-14B-Rapid-AllInOne.
103+
- Ensure the `Load CLIP` node loads `umt5_xxl_fp8_e4m3fn_scaled.safetensors`
104+
105+
2. Ray configuration
106+
107+
Set the `GPU` and `ulysses_degree` in `Ray Init Actor` node to GPU nums you want to use.
108+
109+
3. Click the `Run` button or use the shortcut `Ctrl(cmd) + Enter` to run the workflow
110+
111+
## XInference
112+
113+
```bash
114+
export ZE_AFFINITY_MASK=0 # In multi XPU environment, clearly select GPU index to avoid issues.
115+
xinference-local --host 0.0.0.0 --port 9997
116+
```
117+
Supported models:
118+
- Stable Diffusion 3.5 Medium
119+
- Kokoro 82M
120+
- whisper large v3
121+
122+
### WebUI Usage
123+
124+
#### 1. Access Xinference Web UI
125+
![xinference_launch](./assets/xinference_launch.png)
126+
127+
#### 2. Select model and configure `model_path`
128+
![xinference_model](./assets/xinference_configure.png)
129+
130+
#### 3. Find running model and launch Gradio UI for this model
131+
![xinference_gradio](./assets/xinference_gradio.png)
132+
133+
#### 4. Generate within Gradio UI
134+
![xinference_example](./assets/xinference_sd.png)
135+
136+
### OpenAI API Usage
137+
138+
> Visit http://127.0.0.1:9997/docs to inspect the API docs.
139+
140+
#### 1. Launch API service
141+
You can select model and launch service via WebUI (refer to [here](#1-access-xinference-web-ui)) or by command:
142+
143+
```bash
144+
export ZE_AFFINITY_MASK=0 # In multi XPU environment, clearly select GPU index to avoid issues.
145+
xinference-local --host 0.0.0.0 --port 9997
146+
147+
xinference launch --model-name sd3.5-medium --model-type image --model-path /llm/models/stable-diffusion-3.5-medium/
148+
```
149+
150+
#### 2. Post request in OpenAI API format
151+
152+
For TTS model (`Kokoro 82M` for example):
153+
```bash
154+
curl http://localhost:9997/v1/audio/speech -H "Content-Type: application/json" -d '{
155+
"model": "Kokoro-82M",
156+
"input": "kokoro, hello, I am kokoro."
157+
}' --output output.wav
158+
```
159+
160+
For STT models (`whisper large v3` for example):
161+
```bash
162+
AUDIO_FILE_PATH=<your_audio_file_path>
163+
164+
curl -X 'POST' \
165+
"http://localhost:9997/v1/audio/translations" \
166+
-H 'accept: application/json' \
167+
-F "model=whisper-large-v3" \
168+
-F "file=@${AUDIO_FILE_PATH}"
169+
170+
{"text":" Cacaro's hello, I am Cacaro."}
171+
```
172+
173+
For text-to-image models (`Stable Diffusion 3.5 Medium` for example):
174+
```bash
175+
curl http://localhost:9997/v1/images/generations \
176+
-H "Content-Type: application/json" \
177+
-d '{
178+
"model": "sd3.5-medium",
179+
"prompt": "A Shiba Inu chasing butterflies on a sunny grassy field, cartoon style, with vibrant colors.",
180+
"n": 1,
181+
"size": "1024x1024",
182+
"quality": "standard",
183+
"response_format": "url"
184+
}'
185+
```
186+
187+
## Stand-alone Examples
188+
189+
> Notes: Stand-alone examples are excluded from `intel/llm-scaler-omni` image.
190+
191+
Supported models:
192+
- Hunyuan3D 2.1
193+
- Qwen Image
194+
- Wan 2.1 / 2.2
195+
File renamed without changes.
File renamed without changes.
43.1 KB
Loading

omni/assets/xinference_gradio.png

68.8 KB
Loading

omni/assets/xinference_launch.png

72.2 KB
Loading

omni/assets/xinference_sd.png

88 KB
Loading

omni/build.sh

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
set -x
2+
3+
export HTTP_PROXY=<your_http_proxy>
4+
export HTTPS_PROXY=<your_https_proxy>
5+
6+
docker build -f ./docker/Dockerfile . -t intel/llm-scaler-omni:0.1-b1 --build-arg https_proxy=$HTTPS_PROXY --build-arg http_proxy=$HTTP_PROXY

visual-ai/ComfyUI/docker/Dockerfile renamed to omni/docker/Dockerfile

Lines changed: 26 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -8,33 +8,44 @@ ARG https_proxy
88
ARG http_proxy
99
ENV LD_LIBRARY_PATH="/usr/local/lib:/usr/local/lib/python3.10/dist-packages/torch/lib:$LD_LIBRARY_PATH"
1010

11+
COPY ./patches/yunchang_for_multi_arc.patch /tmp/
12+
COPY ./patches/xdit_for_multi_arc.patch /tmp/
13+
COPY ./patches/raylight_for_multi_arc.patch /tmp/
14+
1115
# Add Intel oneAPI repo and PPA for GPU support
1216
RUN wget -O- https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB | gpg --dearmor | tee /usr/share/keyrings/oneapi-archive-keyring.gpg > /dev/null && \
1317
echo "deb [signed-by=/usr/share/keyrings/oneapi-archive-keyring.gpg] https://apt.repos.intel.com/oneapi all main" | tee /etc/apt/sources.list.d/oneAPI.list && \
1418
add-apt-repository -y ppa:kobuk-team/intel-graphics-testing && \
1519
# Install dependencies and Python 3.10
1620
apt-get update -y && \
1721
apt-get install -y software-properties-common libgl1 && \
22+
apt-get install -y libxrender1 libxfixes3 libx11-dev libxi6 libxxf86vm1 libxcursor1 libxrandr2 libxinerama1 libxkbcommon0 libsm6 ffmpeg && \
1823
add-apt-repository ppa:deadsnakes/ppa && \
1924
apt-get update -y && \
2025
apt-get install -y python3.10 python3.10-distutils python3.10-dev && \
2126
curl -sS https://bootstrap.pypa.io/get-pip.py | python3.10 && \
2227
update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.10 1 && \
23-
pip install torch==2.7.0 torchvision==0.22.0 torchaudio==2.7.0 --index-url https://download.pytorch.org/whl/xpu && \
24-
pip install intel-extension-for-pytorch==2.7.10+xpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/cn/ && \
25-
pip install oneccl_bind_pt==2.7.0+xpu --index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/cn/ && \
26-
pip install bigdl-core-xe-all==2.6.0 --extra-index-url https://download.pytorch.org/whl/xpu && \
28+
pip install torch==2.8.0 torchvision==0.23.0 torchaudio==2.8.0 --index-url https://download.pytorch.org/whl/xpu && \
29+
pip install oneccl_bind_pt==2.8.0+xpu --index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/cn/ && \
30+
pip install bigdl-core-xe-all==2.6.0 --index-url https://download.pytorch.org/whl/xpu && \
2731
apt remove python3-blinker -y && \
28-
# Install xDit related dependencies
32+
wget https://download.blender.org/pypi/bpy/bpy-4.0.0-cp310-cp310-manylinux_2_28_x86_64.whl && \
33+
pip install bpy-4.0.0-cp310-cp310-manylinux_2_28_x86_64.whl && \
34+
rm bpy-4.0.0-cp310-cp310-manylinux_2_28_x86_64.whl && \
35+
# Install xDit related dependencies
2936
mkdir /llm && \
3037
cd /llm && \
3138
ln -s /usr/bin/python3 /usr/bin/python && \
32-
git clone https://github.com/analytics-zoo/long-context-attention.git -b xpu-main && \
39+
git clone https://github.com/feifeibear/long-context-attention.git && \
3340
cd long-context-attention && \
41+
git checkout fc5d55e61b78b3102fd824bea1791cf406cc2a4b && \
42+
git apply /tmp/yunchang_for_multi_arc.patch && \
3443
pip install -e . && \
3544
cd /llm && \
36-
git clone https://github.com/analytics-zoo/xDiT.git -b xpu-main && \
45+
git clone https://github.com/xdit-project/xDiT.git && \
3746
cd xDiT && \
47+
git checkout fb8fb0e437a8745b9629020759de31d1626a4a7b && \
48+
git apply /tmp/xdit_for_multi_arc.patch && \
3849
pip install -e . && \
3950
# Install ComfyUI
4051
cd /llm && \
@@ -47,13 +58,19 @@ RUN wget -O- https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRO
4758
cd comfyui-videohelpersuite && \
4859
pip install -r requirements.txt && \
4960
cd .. && \
50-
git clone https://github.com/xiangyuT/raylight.git -b xpu_main && \
61+
git clone https://github.com/komikndr/raylight.git && \
5162
cd raylight && \
63+
git checkout 290c934cdd498b003fbf083e74e91ffc8edb961a && \
64+
git apply /tmp/raylight_for_multi_arc.patch && \
5265
pip install -r requirements.txt && \
5366
cd .. && \
5467
git clone https://github.com/yolain/ComfyUI-Easy-Use.git comfyui-easy-use && \
5568
cd comfyui-easy-use && \
56-
pip install -r requirements.txt
69+
pip install -r requirements.txt && \
70+
# Install Xinference
71+
pip install "xinference[transformers]" && \
72+
# Clean
73+
rm -rf /tmp/*
5774

5875
COPY ./workflows/* /llm/ComfyUI/user/default/workflows/
5976

Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
diff --git a/src/raylight/distributed_worker/ray_worker.py b/src/raylight/distributed_worker/ray_worker.py
2+
index b3fcd2a..804fd6d 100644
3+
--- a/src/raylight/distributed_worker/ray_worker.py
4+
+++ b/src/raylight/distributed_worker/ray_worker.py
5+
@@ -98,6 +98,13 @@ def usp_inject_callback(
6+
)
7+
8+
9+
+try:
10+
+ import intel_extension_for_pytorch as ipex
11+
+except:
12+
+ pass
13+
+
14+
+import oneccl_bindings_for_pytorch
15+
+
16+
class RayWorker:
17+
def __init__(self, local_rank, world_size, device_id, parallel_dict):
18+
self.model = None
19+
@@ -109,7 +116,7 @@ class RayWorker:
20+
21+
self.parallel_dict = parallel_dict
22+
self.parallel_dict["is_fsdp_wrapped"] = False
23+
- self.device = torch.device(f"cuda:{self.device_id}")
24+
+ self.device = torch.device(f"xpu:{self.device_id}")
25+
26+
if self.model is not None:
27+
self.is_model_load = True
28+
@@ -117,9 +124,10 @@ class RayWorker:
29+
self.is_model_load = False
30+
31+
if self.parallel_dict["is_xdit"] or self.parallel_dict["is_fsdp"]:
32+
- os.environ["CUDA_VISIBLE_DEVICES"] = str(self.device_id)
33+
+ #os.environ["CUDA_VISIBLE_DEVICES"] = str(self.device_id)
34+
+ torch.xpu.set_device(local_rank)
35+
dist.init_process_group(
36+
- "nccl",
37+
+ "ccl",
38+
rank=local_rank,
39+
world_size=self.world_size,
40+
timeout=timedelta(minutes=1)
41+
@@ -303,8 +311,8 @@ class RayWorker:
42+
out["samples"] = samples
43+
44+
# Temporary for reducing change of OOM before VAE
45+
- if ray.get_runtime_context().get_accelerator_ids()["GPU"][0] == "0":
46+
- self.model.detach()
47+
+ #if ray.get_runtime_context().get_accelerator_ids()["GPU"][0] == "0":
48+
+ # self.model.detach()
49+
self.model.detach()
50+
comfy.model_management.soft_empty_cache()
51+
gc.collect()
52+
diff --git a/src/raylight/nodes.py b/src/raylight/nodes.py
53+
index 7a552d8..cff7cb7 100644
54+
--- a/src/raylight/nodes.py
55+
+++ b/src/raylight/nodes.py
56+
@@ -50,9 +50,9 @@ class RayInitializer:
57+
58+
# Currenty not implementing CFG parallel, since LoRa can enable non cfg run
59+
world_size = GPU
60+
- max_world_size = torch.cuda.device_count()
61+
- if world_size > max_world_size:
62+
- raise ValueError("To many gpus")
63+
+ #max_world_size = torch.xpu.device_count()
64+
+ #if world_size > max_world_size:
65+
+ # raise ValueError("To many gpus")
66+
if world_size == 0:
67+
raise ValueError("Num of cuda/cudalike device is 0")
68+
if world_size < ulysses_degree * ring_degree:
69+
@@ -101,7 +101,7 @@ class RayInitializer:
70+
gpu_actors = []
71+
for local_rank in range(world_size):
72+
gpu_actors.append(
73+
- gpu_actor.options(num_gpus=1, name=f"RayWorker:{local_rank}").remote(
74+
+ gpu_actor.options(name=f"RayWorker:{local_rank}").remote(
75+
local_rank=local_rank,
76+
world_size=world_size,
77+
device_id=0,

0 commit comments

Comments
 (0)