|
| 1 | +# 入门:使用混合推理进行 VAE 编码 |
| 2 | + |
| 3 | +VAE 编码用于训练、图像到图像和图像到视频——将图像或视频转换为潜在表示。 |
| 4 | + |
| 5 | +## 内存 |
| 6 | + |
| 7 | +这些表格展示了在不同 GPU 上使用 SD v1 和 SD XL 进行 VAE 编码的 VRAM 需求。 |
| 8 | + |
| 9 | +对于这些 GPU 中的大多数,内存使用百分比决定了其他模型(文本编码器、UNet/Transformer)必须被卸载,或者必须使用分块编码,这会增加时间并影响质量。 |
| 10 | + |
| 11 | +<details><summary>SD v1.5</summary> |
| 12 | + |
| 13 | +| GPU | 分辨率 | 时间(秒) | 内存(%) | 分块时间(秒) | 分块内存(%) | |
| 14 | +|:------------------------------|:-------------|-----------------:|-------------:|--------------------:|-------------------:| |
| 15 | +| NVIDIA GeForce RTX 4090 | 512x512 | 0.015 | 3.51901 | 0.015 | 3.51901 | |
| 16 | +| NVIDIA GeForce RTX 4090 | 256x256 | 0.004 | 1.3154 | 0.005 | 1.3154 | |
| 17 | +| NVIDIA GeForce RTX 4090 | 2048x2048 | 0.402 | 47.1852 | 0.496 | 3.51901 | |
| 18 | +| NVIDIA GeForce RTX 4090 | 1024x1024 | 0.078 | 12.2658 | 0.094 | 3.51901 | |
| 19 | +| NVIDIA GeForce RTX 4080 SUPER | 512x512 | 0.023 | 5.30105 | 0.023 | 5.30105 | |
| 20 | +| NVIDIA GeForce RTX 4080 SUPER | 256x256 | 0.006 | 1.98152 | 0.006 | 1.98152 | |
| 21 | +| NVIDIA GeForce RTX 4080 SUPER | 2048x2048 | 0.574 | 71.08 | 0.656 | 5.30105 | |
| 22 | +| NVIDIA GeForce RTX 4080 SUPER | 1024x1024 | 0.111 | 18.4772 | 0.14 | 5.30105 | |
| 23 | +| NVIDIA GeForce RTX 3090 | 512x512 | 0.032 | 3.52782 | 0.032 | 3.52782 | |
| 24 | +| NVIDIA GeForce RTX 3090 | 256x256 | 0.01 | 1.31869 | 0.009 | 1.31869 | |
| 25 | +| NVIDIA GeForce RTX 3090 | 2048x2048 | 0.742 | 47.3033 | 0.954 | 3.52782 | |
| 26 | +| NVIDIA GeForce RTX 3090 | 1024x1024 | 0.136 | 12.2965 | 0.207 | 3.52782 | |
| 27 | +| NVIDIA GeForce RTX 3080 | 512x512 | 0.036 | 8.51761 | 0.036 | 8.51761 | |
| 28 | +| NVIDIA GeForce RTX 3080 | 256x256 | 0.01 | 3.18387 | 0.01 | 3.18387 | |
| 29 | +| NVIDIA GeForce RTX 3080 | 2048x2048 | 0.863 | 86.7424 | 1.191 | 8.51761 | |
| 30 | +| NVIDIA GeForce RTX 3080 | 1024x1024 | 0.157 | 29.6888 | 0.227 | 8.51761 | |
| 31 | +| NVIDIA GeForce RTX 3070 | 512x512 | 0.051 | 10.6941 | 0.051 | 10.6941 | |
| 32 | +| NVIDIA GeForce RTX 3070 | 256x256 | 0.015 | |
| 33 | +| 3.99743 | 0.015 | 3.99743 | |
| 34 | +| NVIDIA GeForce RTX 3070 | 2048x2048 | 1.217 | 96.054 | 1.482 | 10.6941 | |
| 35 | +| NVIDIA GeForce RTX 3070 | 1024x1024 | 0.223 | 37.2751 | 0.327 | 10.6941 | |
| 36 | + |
| 37 | +</details> |
| 38 | + |
| 39 | +<details><summary>SDXL</summary> |
| 40 | + |
| 41 | +| GPU | Resolution | Time (seconds) | Memory Consumed (%) | Tiled Time (seconds) | Tiled Memory (%) | |
| 42 | +|:------------------------------|:-------------|-----------------:|----------------------:|-----------------------:|-------------------:| |
| 43 | +| NVIDIA GeForce RTX 4090 | 512x512 | 0.029 | 4.95707 | 0.029 | 4.95707 | |
| 44 | +| NVIDIA GeForce RTX 4090 | 256x256 | 0.007 | 2.29666 | 0.007 | 2.29666 | |
| 45 | +| NVIDIA GeForce RTX 4090 | 2048x2048 | 0.873 | 66.3452 | 0.863 | 15.5649 | |
| 46 | +| NVIDIA GeForce RTX 4090 | 1024x1024 | 0.142 | 15.5479 | 0.143 | 15.5479 | |
| 47 | +| NVIDIA GeForce RTX 4080 SUPER | 512x512 | 0.044 | 7.46735 | 0.044 | 7.46735 | |
| 48 | +| NVIDIA GeForce RTX 4080 SUPER | 256x256 | 0.01 | 3.4597 | 0.01 | 3.4597 | |
| 49 | +| NVIDIA GeForce RTX 4080 SUPER | 2048x2048 | 1.317 | 87.1615 | 1.291 | 23.447 | |
| 50 | +| NVIDIA GeForce RTX 4080 SUPER | 1024x1024 | 0.213 | 23.4215 | 0.214 | 23.4215 | |
| 51 | +| NVIDIA GeForce RTX 3090 | 512x512 | 0.058 | 5.65638 | 0.058 | 5.65638 | |
| 52 | +| NVIDIA GeForce RTX 3090 | 256x256 | 0.016 | 2.45081 | 0.016 | 2.45081 | |
| 53 | +| NVIDIA GeForce RTX 3090 | 2048x2048 | 1.755 | 77.8239 | 1.614 | 18.4193 | |
| 54 | +| NVIDIA GeForce RTX 3090 | 1024x1024 | 0.265 | 18.4023 | 0.265 | 18.4023 | |
| 55 | +| NVIDIA GeForce RTX 3080 | 512x512 | 0.064 | 13.6568 | 0.064 | 13.6568 | |
| 56 | +| NVIDIA GeForce RTX 3080 | 256x256 | 0.018 | 5.91728 | 0.018 | 5.91728 | |
| 57 | +| NVIDIA GeForce RTX 3080 | 2048x2048 | 内存不足 (OOM) | 内存不足 (OOM) | 1.866 | 44.4717 | |
| 58 | +| NVIDIA GeForce RTX 3080 | 1024x1024 | 0.302 | 44.4308 | 0.302 | 44.4308 | |
| 59 | +| NVIDIA GeForce RTX 3070 | 512x512 | 0.093 | 17.1465 | 0.093 | 17.1465 | |
| 60 | +| NVIDIA GeForce R |
| 61 | +| NVIDIA GeForce RTX 3070 | 256x256 | 0.025 | 7.42931 | 0.026 | 7.42931 | |
| 62 | +| NVIDIA GeForce RTX 3070 | 2048x2048 | OOM | OOM | 2.674 | 55.8355 | |
| 63 | +| NVIDIA GeForce RTX 3070 | 1024x1024 | 0.443 | 55.7841 | 0.443 | 55.7841 | |
| 64 | + |
| 65 | +</details> |
| 66 | + |
| 67 | +## 可用 VAE |
| 68 | + |
| 69 | +| | **端点** | **模型** | |
| 70 | +|:-:|:-----------:|:--------:| |
| 71 | +| **Stable Diffusion v1** | [https://qc6479g0aac6qwy9.us-east-1.aws.endpoints.huggingface.cloud](https://qc6479g0aac6qwy9.us-east-1.aws.endpoints.huggingface.cloud) | [`stabilityai/sd-vae-ft-mse`](https://hf.co/stabilityai/sd-vae-ft-mse) | |
| 72 | +| **Stable Diffusion XL** | [https://xjqqhmyn62rog84g.us-east-1.aws.endpoints.huggingface.cloud](https://xjqqhmyn62rog84g.us-east-1.aws.endpoints.huggingface.cloud) | [`madebyollin/sdxl-vae-fp16-fix`](https://hf.co/madebyollin/sdxl-vae-fp16-fix) | |
| 73 | +| **Flux** | [https://ptccx55jz97f9zgo.us-east-1.aws.endpoints.huggingface.cloud](https://ptccx55jz97f9zgo.us-east-1.aws.endpoints.huggingface.cloud) | [`black-forest-labs/FLUX.1-schnell`](https://hf.co/black-forest-labs/FLUX.1-schnell) | |
| 74 | + |
| 75 | + |
| 76 | +> [!TIP] |
| 77 | +> 模型支持可以在此处请求:[这里](https://github.com/huggingface/diffusers/issues/new?template=remote-vae-pilot-feedback.yml)。 |
| 78 | +
|
| 79 | + |
| 80 | +## 代码 |
| 81 | + |
| 82 | +> [!TIP] |
| 83 | +> 从 `main` 安装 `diffusers` 以运行代码:`pip install git+https://github.com/huggingface/diffusers@main` |
| 84 | +
|
| 85 | + |
| 86 | +一个辅助方法简化了与混合推理的交互。 |
| 87 | + |
| 88 | +```python |
| 89 | +from diffusers.utils.remote_utils import remote_encode |
| 90 | +``` |
| 91 | + |
| 92 | +### 基本示例 |
| 93 | + |
| 94 | +让我们编码一张图像,然后解码以演示。 |
| 95 | + |
| 96 | +<figure class="image flex flex-col items-center justify-center text-center m-0 w-full"> |
| 97 | +<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/astronaut.jpg"/> |
| 98 | +</figure> |
| 99 | + |
| 100 | +<details><summary>代码</summary> |
| 101 | + |
| 102 | +```python |
| 103 | +from diffusers.utils import load_image |
| 104 | +from diffusers.utils.remote_utils import remote_decode |
| 105 | + |
| 106 | +image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/astronaut.jpg?download=true") |
| 107 | + |
| 108 | +latent = remote_encode( |
| 109 | + endpoint="https://ptccx55jz97f9zgo.us-east-1.aws.endpoints.huggingface.cloud/", |
| 110 | + scaling_factor=0.3611, |
| 111 | + shift_factor=0.1159, |
| 112 | +) |
| 113 | + |
| 114 | +decoded = remote_decode( |
| 115 | + endpoint="https://whhx50ex1aryqvw6.us-east-1.aws.endpoints.huggingface.cloud/", |
| 116 | + tensor=latent, |
| 117 | + scaling_factor=0.3611, |
| 118 | + shift_factor=0.1159, |
| 119 | +) |
| 120 | +``` |
| 121 | + |
| 122 | +</details> |
| 123 | + |
| 124 | +<figure class="image flex flex-col items-center justify-center text-center m-0 w-full"> |
| 125 | +<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/remote_vae/decoded.png"/> |
| 126 | +</figure> |
| 127 | + |
| 128 | + |
| 129 | +### 生成 |
| 130 | + |
| 131 | +现在让我们看一个生成示例,我们将编码图像,生成,然后远程解码! |
| 132 | + |
| 133 | +<details><summary>代码</summary> |
| 134 | + |
| 135 | +```python |
| 136 | +import torch |
| 137 | +from diffusers import StableDiffusionImg2ImgPip |
| 138 | +from diffusers.utils import load_image |
| 139 | +from diffusers.utils.remote_utils import remote_decode, remote_encode |
| 140 | + |
| 141 | +pipe = StableDiffusionImg2ImgPipeline.from_pretrained( |
| 142 | + "stable-diffusion-v1-5/stable-diffusion-v1-5", |
| 143 | + torch_dtype=torch.float16, |
| 144 | + variant="fp16", |
| 145 | + vae=None, |
| 146 | +).to("cuda") |
| 147 | + |
| 148 | +init_image = load_image( |
| 149 | + "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg" |
| 150 | +) |
| 151 | +init_image = init_image.resize((768, 512)) |
| 152 | + |
| 153 | +init_latent = remote_encode( |
| 154 | + endpoint="https://qc6479g0aac6qwy9.us-east-1.aws.endpoints.huggingface.cloud/", |
| 155 | + image=init_image, |
| 156 | + scaling_factor=0.18215, |
| 157 | +) |
| 158 | + |
| 159 | +prompt = "A fantasy landscape, trending on artstation" |
| 160 | +latent = pipe( |
| 161 | + prompt=prompt, |
| 162 | + image=init_latent, |
| 163 | + strength=0.75, |
| 164 | + output_type="latent", |
| 165 | +).images |
| 166 | + |
| 167 | +image = remote_decode( |
| 168 | + endpoint="https://q1bj3bpq6kzilnsu.us-east-1.aws.endpoints.huggingface.cloud/", |
| 169 | + tensor=latent, |
| 170 | + scaling_factor=0.18215, |
| 171 | +) |
| 172 | +image.save("fantasy_landscape.jpg") |
| 173 | +``` |
| 174 | + |
| 175 | +</details> |
| 176 | + |
| 177 | +<figure class="image flex flex-col items-center justify-center text-center m-0 w-full"> |
| 178 | +<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/remote_vae/fantasy_landscape.png"/> |
| 179 | +</figure> |
| 180 | + |
| 181 | +## 集成 |
| 182 | + |
| 183 | +* **[SD.Next](https://github.com/vladmandic/sdnext):** 具有直接支持混合推理功能的一体化用户界面。 |
| 184 | +* **[ComfyUI-HFRemoteVae](https://github.com/kijai/ComfyUI-HFRemoteVae):** 用于混合推理的 ComfyUI 节点。 |
0 commit comments