|
| 1 | +# llm-scaler-omni |
| 2 | + |
| 3 | +--- |
| 4 | + |
| 5 | +## Table of Contents |
| 6 | + |
| 7 | +1. [Getting Started with Omni Docker Image](#getting-started-with-omni-docker-image) |
| 8 | +2. [ComfyUI](#comfyui) |
| 9 | +3. [XInference](#xinference) |
| 10 | +4. [Stand-alone Examples](#stand-alone-examples) |
| 11 | + |
| 12 | +--- |
| 13 | + |
| 14 | +## Getting Started with Omni Docker Image |
| 15 | + |
| 16 | +Build docker image: |
| 17 | + |
| 18 | +```bash |
| 19 | +bash build.sh |
| 20 | +``` |
| 21 | + |
| 22 | +Run docker image: |
| 23 | + |
| 24 | +```bash |
| 25 | +export DOCKER_IMAGE=intel/llm-scaler-omni:0.1-b1 |
| 26 | +export CONTAINER_NAME=comfyui |
| 27 | +export MODEL_DIR=<your_model_dir> |
| 28 | +export COMFYUI_MODEL_DIR=<your_comfyui_model_dir> |
| 29 | +sudo docker run -itd \ |
| 30 | + --privileged \ |
| 31 | + --net=host \ |
| 32 | + --device=/dev/dri \ |
| 33 | + -e no_proxy=localhost,127.0.0.1 \ |
| 34 | + --name=$CONTAINER_NAME \ |
| 35 | + -v $MODEL_DIR:/llm/models/ \ |
| 36 | + -v $COMFYUI_MODEL_DIR:/llm/ComfyUI/models \ |
| 37 | + --shm-size="64g" \ |
| 38 | + --entrypoint=/bin/bash \ |
| 39 | + $DOCKER_IMAGE |
| 40 | + |
| 41 | +docker exec -it comfyui bash |
| 42 | +``` |
| 43 | + |
| 44 | +## ComfyUI: |
| 45 | +```bash |
| 46 | +cd /llm/ComfyUI |
| 47 | + |
| 48 | +MODEL_PATH=<your_comfyui_models_path> |
| 49 | +rm -rf /llm/ComfyUI/models |
| 50 | +ln -s $MODEL_PATH /llm/ComfyUI/models |
| 51 | +echo "Symbolic link created from $MODEL_PATH to /llm/ComfyUI/models" |
| 52 | + |
| 53 | +export http_proxy=<your_proxy> |
| 54 | +export https_proxy=<your_proxy> |
| 55 | +export no_proxy=localhost,127.0.0.1 |
| 56 | + |
| 57 | +python3 main.py |
| 58 | +``` |
| 59 | + |
| 60 | +Then you can access the webUI at `http://<your_local_ip>:8188/`. On the left side, |
| 61 | + |
| 62 | + |
| 63 | + |
| 64 | +### ComfyUI workflows |
| 65 | + |
| 66 | +Currently, the following workflows are supported on B60: |
| 67 | +- Qwen-Image (refer to https://raw.githubusercontent.com/Comfy-Org/example_workflows/main/image/qwen/image_qwen_image_distill.json) |
| 68 | +- Qwen-Image-Edit (refer to https://raw.githubusercontent.com/Comfy-Org/workflow_templates/refs/heads/main/templates/image_qwen_image_edit.json) |
| 69 | +- Wan2.2-TI2V-5B (refer to https://raw.githubusercontent.com/Comfy-Org/workflow_templates/refs/heads/main/templates/video_wan2_2_5B_ti2v.json) |
| 70 | +- Wan2.2-T2V-14B with raylight (refer to https://github.com/komikndr/raylight/blob/main/example_workflows/WanT2V_Raylight.json) |
| 71 | +- Flux.1 Kontext Dev(Basic) workflow in ComfyUI examples (refer to https://docs.comfy.org/tutorials/flux/flux-1-kontext-dev) |
| 72 | +- SD3.5 Simple in ComfyUI examples (refer to https://comfyanonymous.github.io/ComfyUI_examples/sd3/) |
| 73 | + |
| 74 | +#### Qwen-Image |
| 75 | + |
| 76 | +ComfyUI tutorial for qwen-image: https://docs.comfy.org/tutorials/image/qwen/qwen-image |
| 77 | + |
| 78 | +Only `Qwen-Image Native Workflow Example` part is validated and there are some issues using LoRA. It's recommended to run the Distilled version for better performance. |
| 79 | + |
| 80 | +#### Qwen-Image-Edit |
| 81 | + |
| 82 | +ComfyUI tutorial for qwen-image-edit: https://docs.comfy.org/tutorials/image/qwen/qwen-image-edit |
| 83 | + |
| 84 | +#### Wan2.2-TI2V-5B |
| 85 | + |
| 86 | +ComfyUI tutorial for wan2.2: https://docs.comfy.org/tutorials/video/wan/wan2_2 |
| 87 | + |
| 88 | +Due to memory limit with single device, only ` |
| 89 | +Wan2.2 TI2V 5B Hybrid Version Workflow Example` is validated. |
| 90 | + |
| 91 | +#### Wan2.2-T2V-14B with raylight |
| 92 | + |
| 93 | +Currently using [WAN2.2-14B-Rapid-AllInOne](https://huggingface.co/Phr00t/WAN2.2-14B-Rapid-AllInOne) and [raylight](https://github.com/komikndr/raylight) as a faster solution with multi-XPU support. The model weights can get from [here](https://modelscope.cn/models/Phr00t/WAN2.2-14B-Rapid-AllInOne/files), and you may need to extract the unet part and VAE part seperately with `tools/extract.py`. |
| 94 | + |
| 95 | + |
| 96 | + |
| 97 | +##### Follow the Steps to Complete the Workflow |
| 98 | + |
| 99 | +1. Model Loading |
| 100 | + |
| 101 | +- Ensure the `Load Diffusion Model (Ray)` node loads the diffusion model part from WAN2.2-14B-Rapid-AllInOne. |
| 102 | +- Ensure the `Load VAE` node loads the VAE part from WAN2.2-14B-Rapid-AllInOne. |
| 103 | +- Ensure the `Load CLIP` node loads `umt5_xxl_fp8_e4m3fn_scaled.safetensors` |
| 104 | + |
| 105 | +2. Ray configuration |
| 106 | + |
| 107 | +Set the `GPU` and `ulysses_degree` in `Ray Init Actor` node to GPU nums you want to use. |
| 108 | + |
| 109 | +3. Click the `Run` button or use the shortcut `Ctrl(cmd) + Enter` to run the workflow |
| 110 | + |
| 111 | +## XInference |
| 112 | + |
| 113 | +```bash |
| 114 | +export ZE_AFFINITY_MASK=0 # In multi XPU environment, clearly select GPU index to avoid issues. |
| 115 | +xinference-local --host 0.0.0.0 --port 9997 |
| 116 | +``` |
| 117 | +Supported models: |
| 118 | +- Stable Diffusion 3.5 Medium |
| 119 | +- Kokoro 82M |
| 120 | +- whisper large v3 |
| 121 | + |
| 122 | +### WebUI Usage |
| 123 | + |
| 124 | +#### 1. Access Xinference Web UI |
| 125 | + |
| 126 | + |
| 127 | +#### 2. Select model and configure `model_path` |
| 128 | + |
| 129 | + |
| 130 | +#### 3. Find running model and launch Gradio UI for this model |
| 131 | + |
| 132 | + |
| 133 | +#### 4. Generate within Gradio UI |
| 134 | + |
| 135 | + |
| 136 | +### OpenAI API Usage |
| 137 | + |
| 138 | +> Visit http://127.0.0.1:9997/docs to inspect the API docs. |
| 139 | +
|
| 140 | +#### 1. Launch API service |
| 141 | +You can select model and launch service via WebUI (refer to [here](#1-access-xinference-web-ui)) or by command: |
| 142 | + |
| 143 | +```bash |
| 144 | +export ZE_AFFINITY_MASK=0 # In multi XPU environment, clearly select GPU index to avoid issues. |
| 145 | +xinference-local --host 0.0.0.0 --port 9997 |
| 146 | + |
| 147 | +xinference launch --model-name sd3.5-medium --model-type image --model-path /llm/models/stable-diffusion-3.5-medium/ |
| 148 | +``` |
| 149 | + |
| 150 | +#### 2. Post request in OpenAI API format |
| 151 | + |
| 152 | +For TTS model (`Kokoro 82M` for example): |
| 153 | +```bash |
| 154 | +curl http://localhost:9997/v1/audio/speech -H "Content-Type: application/json" -d '{ |
| 155 | + "model": "Kokoro-82M", |
| 156 | + "input": "kokoro, hello, I am kokoro." |
| 157 | + }' --output output.wav |
| 158 | +``` |
| 159 | + |
| 160 | +For STT models (`whisper large v3` for example): |
| 161 | +```bash |
| 162 | +AUDIO_FILE_PATH=<your_audio_file_path> |
| 163 | + |
| 164 | +curl -X 'POST' \ |
| 165 | + "http://localhost:9997/v1/audio/translations" \ |
| 166 | + -H 'accept: application/json' \ |
| 167 | + -F "model=whisper-large-v3" \ |
| 168 | + -F "file=@${AUDIO_FILE_PATH}" |
| 169 | + |
| 170 | +{"text":" Cacaro's hello, I am Cacaro."} |
| 171 | +``` |
| 172 | + |
| 173 | +For text-to-image models (`Stable Diffusion 3.5 Medium` for example): |
| 174 | +```bash |
| 175 | +curl http://localhost:9997/v1/images/generations \ |
| 176 | + -H "Content-Type: application/json" \ |
| 177 | + -d '{ |
| 178 | + "model": "sd3.5-medium", |
| 179 | + "prompt": "A Shiba Inu chasing butterflies on a sunny grassy field, cartoon style, with vibrant colors.", |
| 180 | + "n": 1, |
| 181 | + "size": "1024x1024", |
| 182 | + "quality": "standard", |
| 183 | + "response_format": "url" |
| 184 | + }' |
| 185 | +``` |
| 186 | + |
| 187 | +## Stand-alone Examples |
| 188 | + |
| 189 | +> Notes: Stand-alone examples are excluded from `intel/llm-scaler-omni` image. |
| 190 | +
|
| 191 | +Supported models: |
| 192 | +- Hunyuan3D 2.1 |
| 193 | +- Qwen Image |
| 194 | +- Wan 2.1 / 2.2 |
| 195 | + |
0 commit comments