diff --git a/README-chinese.md b/README-chinese.md
new file mode 100644
index 0000000..44a8e73
--- /dev/null
+++ b/README-chinese.md
@@ -0,0 +1,60 @@
+## 非常感谢Deepfloyd的团队开源了IF模型,IF-easy-webui是基于IF模型的一个简单易用的webui
+## 主要解决了IF在依赖包安装、模型下载后找不到文件加载、inpainting类型转换错误等问题
+## 方便大家快速体验IF模型
+
+# 使用指引
+## 1.下载源代码
+```bash
+root@xxxx:~# git clone https://github.com/amazed6666/IF-easy-webui.git
+```
+
+## 2.进入到IF-easy-webui目录下,执行依赖包安装:pip install -r requirements.txt
+```bash
+root@xxxx:~# cd IF-easy-webui
+root@xxxx:~/IF-easy-webui# pip install -r requirements.txt
+```
+
+## 3.安装clip
+```bash
+root@xxxx:~/IF-easy-webui# pip install git+https://github.com/openai/CLIP.git --no-deps
+```
+
+## 4.使用huggingface的Access Token登录huggingface_hub
+```bash
+root@xxxx:~/IF-easy-webui# git config --global credential.helper store
+root@xxxx:~/IF-easy-webui# python
+Python 3.8.10 (default, Jun 4 2021, 15:09:15)
+[GCC 7.5.0] :: Anaconda, Inc. on linux
+Type "help", "copyright", "credits" or "license" for more information.
+>>> from huggingface_hub import login
+>>>
+>>> login()
+
+ _| _| _| _| _|_|_| _|_|_| _|_|_| _| _| _|_|_| _|_|_|_| _|_| _|_|_| _|_|_|_|
+ _| _| _| _| _| _| _| _|_| _| _| _| _| _| _| _|
+ _|_|_|_| _| _| _| _|_| _| _|_| _| _| _| _| _| _|_| _|_|_| _|_|_|_| _| _|_|_|
+ _| _| _| _| _| _| _| _| _| _| _|_| _| _| _| _| _| _| _|
+ _| _| _|_| _|_|_| _|_|_| _|_|_| _| _| _|_|_| _| _| _| _|_|_| _|_|_|_|
+
+Enter your token (input will not be visible):
+Add token as git credential? (Y/n) Y
+>>> exit()
+```
+
+## 5.运行webui.py后,需要等待一段时间,加载模型文件完成后,会显示如下信息;然后可以打开浏览器,输入http://127.0.0.1:6006,即可看到IF-easy-webui的界面,可以开始体验IF模型了
+```bash
+root@xxxx:~/IF-easy-webui# python webui.py
+FORCE_MEM_EFFICIENT_ATTN= 0 @UNET:QKVATTENTION
+/root/miniconda3/lib/python3.8/site-packages/huggingface_hub/file_download.py:791: FutureWarning: The `force_filename` parameter is deprecated as a new caching system, which keeps the filenames as they are on the Hub, is now in place.
+ warnings.warn(
+Loading pipeline components...: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 8.59it/s]
+Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:19<00:00, 9.86s/it]
+Running on local URL: http://127.0.0.1:6006
+
+To create a public link, set `share=True` in `launch()`.
+```
+
+
+## 6.温馨提示:怎么获取huggingface的Access Token,首先登录https://huggingface.co,然后再点击右上角我的头像,在显示的拉下菜单中找到Access Token,点击进入该页面可以生成huggingface_hub登录所需要的Access Token(参照截图)
+
+
diff --git a/README-deepfloyd.md b/README-deepfloyd.md
new file mode 100644
index 0000000..68b6d2b
--- /dev/null
+++ b/README-deepfloyd.md
@@ -0,0 +1,354 @@
+[](LICENSE)
+[](LICENSE-MODEL)
+[](https://pepy.tech/project/deepfloyd_if)
+[](https://discord.gg/umz62Mgr)
+[](https://twitter.com/deepfloydai)
+[](http://linktr.ee/deepfloyd)
+
+# IF by [DeepFloyd Lab](https://deepfloyd.ai) at [StabilityAI](https://stability.ai/)
+
+
+
+
+
+We introduce DeepFloyd IF, a novel state-of-the-art open-source text-to-image model with a high degree of photorealism and language understanding. DeepFloyd IF is a modular composed of a frozen text encoder and three cascaded pixel diffusion modules: a base model that generates 64x64 px image based on text prompt and two super-resolution models, each designed to generate images of increasing resolution: 256x256 px and 1024x1024 px. All stages of the model utilize a frozen text encoder based on the T5 transformer to extract text embeddings, which are then fed into a UNet architecture enhanced with cross-attention and attention pooling. The result is a highly efficient model that outperforms current state-of-the-art models, achieving a zero-shot FID score of 6.66 on the COCO dataset. Our work underscores the potential of larger UNet architectures in the first stage of cascaded diffusion models and depicts a promising future for text-to-image synthesis.
+
+
+
+
+
+*Inspired by* [*Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding*](https://arxiv.org/pdf/2205.11487.pdf)
+
+## Minimum requirements to use all IF models:
+- 16GB vRAM for IF-I-XL (4.3B text to 64x64 base module) & IF-II-L (1.2B to 256x256 upscaler module)
+- 24GB vRAM for IF-I-XL (4.3B text to 64x64 base module) & IF-II-L (1.2B to 256x256 upscaler module) & Stable x4 (to 1024x1024 upscaler)
+- `xformers` and set env variable `FORCE_MEM_EFFICIENT_ATTN=1`
+
+
+## Quick Start
+[](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/deepfloyd_if_free_tier_google_colab.ipynb)
+[](https://huggingface.co/spaces/DeepFloyd/IF)
+
+```shell
+pip install deepfloyd_if==1.0.2rc0
+pip install xformers==0.0.16
+pip install git+https://github.com/openai/CLIP.git --no-deps
+```
+
+## Local notebooks
+[](https://huggingface.co/DeepFloyd/IF-notebooks/blob/main/pipes-DeepFloyd-IF-v1.0.ipynb)
+[](https://www.kaggle.com/code/shonenkov/deepfloyd-if-4-3b-generator-of-pictures)
+
+The Dream, Style Transfer, Super Resolution or Inpainting modes are avaliable in a Jupyter Notebook [here](https://huggingface.co/DeepFloyd/IF-notebooks/blob/main/pipes-DeepFloyd-IF-v1.0.ipynb).
+
+
+
+## Integration with 🤗 Diffusers
+
+IF is also integrated with the 🤗 Hugging Face [Diffusers library](https://github.com/huggingface/diffusers/).
+
+Diffusers runs each stage individually allowing the user to customize the image generation process as well as allowing to inspect intermediate results easily.
+
+### Example
+
+Before you can use IF, you need to accept its usage conditions. To do so:
+1. Make sure to have a [Hugging Face account](https://huggingface.co/join) and be loggin in
+2. Accept the license on the model card of [DeepFloyd/IF-I-XL-v1.0](https://huggingface.co/DeepFloyd/IF-I-XL-v1.0)
+3. Make sure to login locally. Install `huggingface_hub`
+```sh
+pip install huggingface_hub --upgrade
+```
+
+run the login function in a Python shell
+
+```py
+from huggingface_hub import login
+
+login()
+```
+
+and enter your [Hugging Face Hub access token](https://huggingface.co/docs/hub/security-tokens#what-are-user-access-tokens).
+
+Next we install `diffusers` and dependencies:
+
+```sh
+pip install diffusers accelerate transformers safetensors
+```
+
+And we can now run the model locally.
+
+By default `diffusers` makes use of [model cpu offloading](https://huggingface.co/docs/diffusers/optimization/fp16#model-offloading-for-fast-inference-and-memory-savings) to run the whole IF pipeline with as little as 14 GB of VRAM.
+
+If you are using `torch>=2.0.0`, make sure to **delete all** `enable_xformers_memory_efficient_attention()`
+functions.
+
+```py
+from diffusers import DiffusionPipeline
+from diffusers.utils import pt_to_pil
+import torch
+
+# stage 1
+stage_1 = DiffusionPipeline.from_pretrained("DeepFloyd/IF-I-XL-v1.0", variant="fp16", torch_dtype=torch.float16)
+stage_1.enable_xformers_memory_efficient_attention() # remove line if torch.__version__ >= 2.0.0
+stage_1.enable_model_cpu_offload()
+
+# stage 2
+stage_2 = DiffusionPipeline.from_pretrained(
+ "DeepFloyd/IF-II-L-v1.0", text_encoder=None, variant="fp16", torch_dtype=torch.float16
+)
+stage_2.enable_xformers_memory_efficient_attention() # remove line if torch.__version__ >= 2.0.0
+stage_2.enable_model_cpu_offload()
+
+# stage 3
+safety_modules = {"feature_extractor": stage_1.feature_extractor, "safety_checker": stage_1.safety_checker, "watermarker": stage_1.watermarker}
+stage_3 = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-x4-upscaler", **safety_modules, torch_dtype=torch.float16)
+stage_3.enable_xformers_memory_efficient_attention() # remove line if torch.__version__ >= 2.0.0
+stage_3.enable_model_cpu_offload()
+
+prompt = 'a photo of a kangaroo wearing an orange hoodie and blue sunglasses standing in front of the eiffel tower holding a sign that says "very deep learning"'
+
+# text embeds
+prompt_embeds, negative_embeds = stage_1.encode_prompt(prompt)
+
+generator = torch.manual_seed(0)
+
+# stage 1
+image = stage_1(prompt_embeds=prompt_embeds, negative_prompt_embeds=negative_embeds, generator=generator, output_type="pt").images
+pt_to_pil(image)[0].save("./if_stage_I.png")
+
+# stage 2
+image = stage_2(
+ image=image, prompt_embeds=prompt_embeds, negative_prompt_embeds=negative_embeds, generator=generator, output_type="pt"
+).images
+pt_to_pil(image)[0].save("./if_stage_II.png")
+
+# stage 3
+image = stage_3(prompt=prompt, image=image, generator=generator, noise_level=100).images
+image[0].save("./if_stage_III.png")
+```
+
+ There are multiple ways to speed up the inference time and lower the memory consumption even more with `diffusers`. To do so, please have a look at the Diffusers docs:
+
+- 🚀 [Optimizing for inference time](https://huggingface.co/docs/diffusers/api/pipelines/if#optimizing-for-speed)
+- ⚙️ [Optimizing for low memory during inference](https://huggingface.co/docs/diffusers/api/pipelines/if#optimizing-for-memory)
+
+For more in-detail information about how to use IF, please have a look at [the IF blog post](https://huggingface.co/blog/if) and [the documentation](https://huggingface.co/docs/diffusers/main/en/api/pipelines/if) 📖.
+
+Diffusers dreambooth scripts also supports fine-tuning 🎨 [IF](https://huggingface.co/docs/diffusers/main/en/training/dreambooth#if).
+With parameter efficient finetuning, you can add new concepts to IF with a single GPU and ~28 GB VRAM.
+
+## Run the code locally
+
+### Loading the models into VRAM
+
+```python
+from deepfloyd_if.modules import IFStageI, IFStageII, StableStageIII
+from deepfloyd_if.modules.t5 import T5Embedder
+
+device = 'cuda:0'
+if_I = IFStageI('IF-I-XL-v1.0', device=device)
+if_II = IFStageII('IF-II-L-v1.0', device=device)
+if_III = StableStageIII('stable-diffusion-x4-upscaler', device=device)
+t5 = T5Embedder(device="cpu")
+```
+
+### I. Dream
+Dream is the text-to-image mode of the IF model
+
+```python
+from deepfloyd_if.pipelines import dream
+
+prompt = 'ultra close-up color photo portrait of rainbow owl with deer horns in the woods'
+count = 4
+
+result = dream(
+ t5=t5, if_I=if_I, if_II=if_II, if_III=if_III,
+ prompt=[prompt]*count,
+ seed=42,
+ if_I_kwargs={
+ "guidance_scale": 7.0,
+ "sample_timestep_respacing": "smart100",
+ },
+ if_II_kwargs={
+ "guidance_scale": 4.0,
+ "sample_timestep_respacing": "smart50",
+ },
+ if_III_kwargs={
+ "guidance_scale": 9.0,
+ "noise_level": 20,
+ "sample_timestep_respacing": "75",
+ },
+)
+
+if_III.show(result['III'], size=14)
+```
+
+
+## II. Zero-shot Image-to-Image Translation
+
+
+
+In Style Transfer mode, the output of your prompt comes out at the style of the `support_pil_img`
+```python
+from deepfloyd_if.pipelines import style_transfer
+
+result = style_transfer(
+ t5=t5, if_I=if_I, if_II=if_II,
+ support_pil_img=raw_pil_image,
+ style_prompt=[
+ 'in style of professional origami',
+ 'in style of oil art, Tate modern',
+ 'in style of plastic building bricks',
+ 'in style of classic anime from 1990',
+ ],
+ seed=42,
+ if_I_kwargs={
+ "guidance_scale": 10.0,
+ "sample_timestep_respacing": "10,10,10,10,10,10,10,10,0,0",
+ 'support_noise_less_qsample_steps': 5,
+ },
+ if_II_kwargs={
+ "guidance_scale": 4.0,
+ "sample_timestep_respacing": 'smart50',
+ "support_noise_less_qsample_steps": 5,
+ },
+)
+if_I.show(result['II'], 1, 20)
+```
+
+
+
+
+## III. Super Resolution
+For super-resolution, users can run `IF-II` and `IF-III` or 'Stable x4' on an image that was not necessarely generated by IF (two cascades):
+
+```python
+from deepfloyd_if.pipelines import super_resolution
+
+middle_res = super_resolution(
+ t5,
+ if_III=if_II,
+ prompt=['woman with a blue headscarf and a blue sweaterp, detailed picture, 4k dslr, best quality'],
+ support_pil_img=raw_pil_image,
+ img_scale=4.,
+ img_size=64,
+ if_III_kwargs={
+ 'sample_timestep_respacing': 'smart100',
+ 'aug_level': 0.5,
+ 'guidance_scale': 6.0,
+ },
+)
+high_res = super_resolution(
+ t5,
+ if_III=if_III,
+ prompt=[''],
+ support_pil_img=middle_res['III'][0],
+ img_scale=4.,
+ img_size=256,
+ if_III_kwargs={
+ "guidance_scale": 9.0,
+ "noise_level": 20,
+ "sample_timestep_respacing": "75",
+ },
+)
+show_superres(raw_pil_image, high_res['III'][0])
+```
+
+
+
+
+### IV. Zero-shot Inpainting
+
+```python
+from deepfloyd_if.pipelines import inpainting
+
+result = inpainting(
+ t5=t5, if_I=if_I,
+ if_II=if_II,
+ if_III=if_III,
+ support_pil_img=raw_pil_image,
+ inpainting_mask=inpainting_mask,
+ prompt=[
+ 'oil art, a man in a hat',
+ ],
+ seed=42,
+ if_I_kwargs={
+ "guidance_scale": 7.0,
+ "sample_timestep_respacing": "10,10,10,10,10,0,0,0,0,0",
+ 'support_noise_less_qsample_steps': 0,
+ },
+ if_II_kwargs={
+ "guidance_scale": 4.0,
+ 'aug_level': 0.0,
+ "sample_timestep_respacing": '100',
+ },
+ if_III_kwargs={
+ "guidance_scale": 9.0,
+ "noise_level": 20,
+ "sample_timestep_respacing": "75",
+ },
+)
+if_I.show(result['I'], 2, 3)
+if_I.show(result['II'], 2, 6)
+if_I.show(result['III'], 2, 14)
+```
+
+
+### 🤗 Model Zoo 🤗
+The link to download the weights as well as the model cards will be available soon on each model of the model zoo
+
+#### Original
+
+| Name | Cascade | Params | FID | Batch size | Steps |
+|:----------------------------------------------------------|:-------:|:------:|:----:|:----------:|:-----:|
+| [IF-I-M](https://huggingface.co/DeepFloyd/IF-I-M-v1.0) | I | 400M | 8.86 | 3072 | 2.5M |
+| [IF-I-L](https://huggingface.co/DeepFloyd/IF-I-L-v1.0) | I | 900M | 8.06 | 3200 | 3.0M |
+| [IF-I-XL](https://huggingface.co/DeepFloyd/IF-I-XL-v1.0)* | I | 4.3B | 6.66 | 3072 | 2.42M |
+| [IF-II-M](https://huggingface.co/DeepFloyd/IF-II-M-v1.0) | II | 450M | - | 1536 | 2.5M |
+| [IF-II-L](https://huggingface.co/DeepFloyd/IF-II-L-v1.0)* | II | 1.2B | - | 1536 | 2.5M |
+| IF-III-L* _(soon)_ | III | 700M | - | 3072 | 1.25M |
+
+ *best modules
+
+### Quantitative Evaluation
+
+`FID = 6.66`
+
+
+
+## License
+
+The code in this repository is released under the bespoke license (see added [point two](https://github.com/deep-floyd/IF/blob/main/LICENSE#L13)).
+
+The weights will be available soon via [the DeepFloyd organization at Hugging Face](https://huggingface.co/DeepFloyd) and have their own LICENSE.
+
+**Disclaimer:** *The initial release of the IF model is under a restricted research-purposes-only license temporarily to gather feedback, and after that we intend to release a fully open-source model in line with other Stability AI models.*
+
+## Limitations and Biases
+
+The models available in this codebase have known limitations and biases. Please refer to [the model card](https://huggingface.co/DeepFloyd/IF-I-L-v1.0) for more information.
+
+
+## 🎓 DeepFloyd IF creators:
+
+- Alex Shonenkov [GitHub](https://github.com/shonenkov) | [Linktr](https://linktr.ee/shonenkovAI)
+- Misha Konstantinov [GitHub](https://github.com/zeroshot-ai) | [Twitter](https://twitter.com/_bra_ket)
+- Daria Bakshandaeva [GitHub](https://github.com/Gugutse) | [Twitter](https://twitter.com/_gugutse_)
+- Christoph Schuhmann [GitHub](https://github.com/christophschuhmann) | [Twitter](https://twitter.com/laion_ai)
+- Ksenia Ivanova [GitHub](https://github.com/ivksu) | [Twitter](https://twitter.com/susiaiv)
+- Nadiia Klokova [GitHub](https://github.com/vauimpuls) | [Twitter](https://twitter.com/vauimpuls)
+
+
+## 📄 Research Paper (Soon)
+
+## Acknowledgements
+
+Special thanks to [StabilityAI](http://stability.ai) and its CEO [Emad Mostaque](https://twitter.com/emostaque) for invaluable support, providing GPU compute and infrastructure to train the models (our gratitude goes to [Richard Vencu](https://github.com/rvencu)); thanks to [LAION](https://laion.ai) and [Christoph Schuhmann](https://github.com/christophschuhmann) in particular for contribution to the project and well-prepared datasets; thanks to [Huggingface](https://huggingface.co) teams for optimizing models' speed and memory consumption during inference, creating demos and giving cool advice!
+
+## 🚀 External Contributors 🚀
+- The Biggest Thanks [@Apolinário](https://github.com/apolinario), for ideas, consultations, help and support on all stages to make IF available in open-source; for writing a lot of documentation and instructions; for creating a friendly atmosphere in difficult moments 🦉;
+- Thanks, [@patrickvonplaten](https://github.com/patrickvonplaten), for improving loading time of unet models by 80%;
+for integration Stable-Diffusion-x4 as native pipeline 💪;
+- Thanks, [@williamberman](https://github.com/williamberman) and [@patrickvonplaten](https://github.com/patrickvonplaten) for diffusers integration 🙌;
+- Thanks, [@hysts](https://github.com/hysts) and [@Apolinário](https://github.com/apolinario) for creating [the best gradio demo with IF](https://huggingface.co/spaces/DeepFloyd/IF) 🚀;
+- Thanks, [@Dango233](https://github.com/Dango233), for adapting IF with xformers memory efficient attention 💪;
diff --git a/README.md b/README.md
index 68b6d2b..3bd4c91 100644
--- a/README.md
+++ b/README.md
@@ -1,354 +1,73 @@
-[](LICENSE)
-[](LICENSE-MODEL)
-[](https://pepy.tech/project/deepfloyd_if)
-[](https://discord.gg/umz62Mgr)
-[](https://twitter.com/deepfloydai)
-[](http://linktr.ee/deepfloyd)
+## Thanks to the Deepfloyd team for open-sourcing the IF model. IF-easy-webui is a simple and user-friendly web interface based on the IF model.
+## which primarily addresses issues such as dependency package installation error, missing model files after download, and errors in converting inpainting types.
+## This makes it easier for everyone to quickly experience the IF model.
-# IF by [DeepFloyd Lab](https://deepfloyd.ai) at [StabilityAI](https://stability.ai/)
+## click here to view chinese readme[README-chinese.md](README-chinese.md)
-
-
-
-
-We introduce DeepFloyd IF, a novel state-of-the-art open-source text-to-image model with a high degree of photorealism and language understanding. DeepFloyd IF is a modular composed of a frozen text encoder and three cascaded pixel diffusion modules: a base model that generates 64x64 px image based on text prompt and two super-resolution models, each designed to generate images of increasing resolution: 256x256 px and 1024x1024 px. All stages of the model utilize a frozen text encoder based on the T5 transformer to extract text embeddings, which are then fed into a UNet architecture enhanced with cross-attention and attention pooling. The result is a highly efficient model that outperforms current state-of-the-art models, achieving a zero-shot FID score of 6.66 on the COCO dataset. Our work underscores the potential of larger UNet architectures in the first stage of cascaded diffusion models and depicts a promising future for text-to-image synthesis.
-
-
-
-
-
-*Inspired by* [*Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding*](https://arxiv.org/pdf/2205.11487.pdf)
-
-## Minimum requirements to use all IF models:
-- 16GB vRAM for IF-I-XL (4.3B text to 64x64 base module) & IF-II-L (1.2B to 256x256 upscaler module)
-- 24GB vRAM for IF-I-XL (4.3B text to 64x64 base module) & IF-II-L (1.2B to 256x256 upscaler module) & Stable x4 (to 1024x1024 upscaler)
-- `xformers` and set env variable `FORCE_MEM_EFFICIENT_ATTN=1`
-
-
-## Quick Start
-[](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/deepfloyd_if_free_tier_google_colab.ipynb)
-[](https://huggingface.co/spaces/DeepFloyd/IF)
-
-```shell
-pip install deepfloyd_if==1.0.2rc0
-pip install xformers==0.0.16
-pip install git+https://github.com/openai/CLIP.git --no-deps
+# User Guide
+## 1.Download the Source Code
+```bash
+root@xxxx:~# git clone https://github.com/amazed6666/IF-easy-webui.git
```
-## Local notebooks
-[](https://huggingface.co/DeepFloyd/IF-notebooks/blob/main/pipes-DeepFloyd-IF-v1.0.ipynb)
-[](https://www.kaggle.com/code/shonenkov/deepfloyd-if-4-3b-generator-of-pictures)
-
-The Dream, Style Transfer, Super Resolution or Inpainting modes are avaliable in a Jupyter Notebook [here](https://huggingface.co/DeepFloyd/IF-notebooks/blob/main/pipes-DeepFloyd-IF-v1.0.ipynb).
-
-
-
-## Integration with 🤗 Diffusers
-
-IF is also integrated with the 🤗 Hugging Face [Diffusers library](https://github.com/huggingface/diffusers/).
-
-Diffusers runs each stage individually allowing the user to customize the image generation process as well as allowing to inspect intermediate results easily.
-
-### Example
-
-Before you can use IF, you need to accept its usage conditions. To do so:
-1. Make sure to have a [Hugging Face account](https://huggingface.co/join) and be loggin in
-2. Accept the license on the model card of [DeepFloyd/IF-I-XL-v1.0](https://huggingface.co/DeepFloyd/IF-I-XL-v1.0)
-3. Make sure to login locally. Install `huggingface_hub`
-```sh
-pip install huggingface_hub --upgrade
+## 2.Navigate to the IF-easy-webui directory and install the required dependencies
+```bash
+root@xxxx:~# cd IF-easy-webui
+root@xxxx:~/IF-easy-webui# pip install -r requirements.txt
```
-run the login function in a Python shell
-
-```py
-from huggingface_hub import login
-
-login()
+## 3.Install clip:
+```bash
+root@xxxx:~/IF-easy-webui# pip install git+https://github.com/openai/CLIP.git --no-deps
```
-and enter your [Hugging Face Hub access token](https://huggingface.co/docs/hub/security-tokens#what-are-user-access-tokens).
-
-Next we install `diffusers` and dependencies:
-
-```sh
-pip install diffusers accelerate transformers safetensors
+## 4.Log in to huggingface_hub Using an Access Token,Configure Git to store credentials and log in using Python:
+```bash
+root@xxxx:~/IF-easy-webui# git config --global credential.helper store
+root@xxxx:~/IF-easy-webui# python
+Python 3.8.10 (default, Jun 4 2021, 15:09:15)
+[GCC 7.5.0] :: Anaconda, Inc. on linux
+Type "help", "copyright", "credits" or "license" for more information.
+>>> from huggingface_hub import login
+>>>
+>>> login()
+
+ _| _| _| _| _|_|_| _|_|_| _|_|_| _| _| _|_|_| _|_|_|_| _|_| _|_|_| _|_|_|_|
+ _| _| _| _| _| _| _| _|_| _| _| _| _| _| _| _|
+ _|_|_|_| _| _| _| _|_| _| _|_| _| _| _| _| _| _|_| _|_|_| _|_|_|_| _| _|_|_|
+ _| _| _| _| _| _| _| _| _| _| _|_| _| _| _| _| _| _| _|
+ _| _| _|_| _|_|_| _|_|_| _|_|_| _| _| _|_|_| _| _| _| _|_|_| _|_|_|_|
+
+Enter your token (input will not be visible):
+Add token as git credential? (Y/n) Y
+>>> exit()
```
-And we can now run the model locally.
-
-By default `diffusers` makes use of [model cpu offloading](https://huggingface.co/docs/diffusers/optimization/fp16#model-offloading-for-fast-inference-and-memory-savings) to run the whole IF pipeline with as little as 14 GB of VRAM.
-
-If you are using `torch>=2.0.0`, make sure to **delete all** `enable_xformers_memory_efficient_attention()`
-functions.
-
-```py
-from diffusers import DiffusionPipeline
-from diffusers.utils import pt_to_pil
-import torch
-
-# stage 1
-stage_1 = DiffusionPipeline.from_pretrained("DeepFloyd/IF-I-XL-v1.0", variant="fp16", torch_dtype=torch.float16)
-stage_1.enable_xformers_memory_efficient_attention() # remove line if torch.__version__ >= 2.0.0
-stage_1.enable_model_cpu_offload()
-
-# stage 2
-stage_2 = DiffusionPipeline.from_pretrained(
- "DeepFloyd/IF-II-L-v1.0", text_encoder=None, variant="fp16", torch_dtype=torch.float16
-)
-stage_2.enable_xformers_memory_efficient_attention() # remove line if torch.__version__ >= 2.0.0
-stage_2.enable_model_cpu_offload()
-
-# stage 3
-safety_modules = {"feature_extractor": stage_1.feature_extractor, "safety_checker": stage_1.safety_checker, "watermarker": stage_1.watermarker}
-stage_3 = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-x4-upscaler", **safety_modules, torch_dtype=torch.float16)
-stage_3.enable_xformers_memory_efficient_attention() # remove line if torch.__version__ >= 2.0.0
-stage_3.enable_model_cpu_offload()
-
-prompt = 'a photo of a kangaroo wearing an orange hoodie and blue sunglasses standing in front of the eiffel tower holding a sign that says "very deep learning"'
-
-# text embeds
-prompt_embeds, negative_embeds = stage_1.encode_prompt(prompt)
-
-generator = torch.manual_seed(0)
-
-# stage 1
-image = stage_1(prompt_embeds=prompt_embeds, negative_prompt_embeds=negative_embeds, generator=generator, output_type="pt").images
-pt_to_pil(image)[0].save("./if_stage_I.png")
-
-# stage 2
-image = stage_2(
- image=image, prompt_embeds=prompt_embeds, negative_prompt_embeds=negative_embeds, generator=generator, output_type="pt"
-).images
-pt_to_pil(image)[0].save("./if_stage_II.png")
-
-# stage 3
-image = stage_3(prompt=prompt, image=image, generator=generator, noise_level=100).images
-image[0].save("./if_stage_III.png")
-```
-
- There are multiple ways to speed up the inference time and lower the memory consumption even more with `diffusers`. To do so, please have a look at the Diffusers docs:
-
-- 🚀 [Optimizing for inference time](https://huggingface.co/docs/diffusers/api/pipelines/if#optimizing-for-speed)
-- ⚙️ [Optimizing for low memory during inference](https://huggingface.co/docs/diffusers/api/pipelines/if#optimizing-for-memory)
-
-For more in-detail information about how to use IF, please have a look at [the IF blog post](https://huggingface.co/blog/if) and [the documentation](https://huggingface.co/docs/diffusers/main/en/api/pipelines/if) 📖.
-
-Diffusers dreambooth scripts also supports fine-tuning 🎨 [IF](https://huggingface.co/docs/diffusers/main/en/training/dreambooth#if).
-With parameter efficient finetuning, you can add new concepts to IF with a single GPU and ~28 GB VRAM.
-
-## Run the code locally
-
-### Loading the models into VRAM
-
-```python
-from deepfloyd_if.modules import IFStageI, IFStageII, StableStageIII
-from deepfloyd_if.modules.t5 import T5Embedder
-
-device = 'cuda:0'
-if_I = IFStageI('IF-I-XL-v1.0', device=device)
-if_II = IFStageII('IF-II-L-v1.0', device=device)
-if_III = StableStageIII('stable-diffusion-x4-upscaler', device=device)
-t5 = T5Embedder(device="cpu")
-```
-
-### I. Dream
-Dream is the text-to-image mode of the IF model
-
-```python
-from deepfloyd_if.pipelines import dream
-
-prompt = 'ultra close-up color photo portrait of rainbow owl with deer horns in the woods'
-count = 4
-
-result = dream(
- t5=t5, if_I=if_I, if_II=if_II, if_III=if_III,
- prompt=[prompt]*count,
- seed=42,
- if_I_kwargs={
- "guidance_scale": 7.0,
- "sample_timestep_respacing": "smart100",
- },
- if_II_kwargs={
- "guidance_scale": 4.0,
- "sample_timestep_respacing": "smart50",
- },
- if_III_kwargs={
- "guidance_scale": 9.0,
- "noise_level": 20,
- "sample_timestep_respacing": "75",
- },
-)
-
-if_III.show(result['III'], size=14)
-```
-
-
-## II. Zero-shot Image-to-Image Translation
-
-
-
-In Style Transfer mode, the output of your prompt comes out at the style of the `support_pil_img`
-```python
-from deepfloyd_if.pipelines import style_transfer
-
-result = style_transfer(
- t5=t5, if_I=if_I, if_II=if_II,
- support_pil_img=raw_pil_image,
- style_prompt=[
- 'in style of professional origami',
- 'in style of oil art, Tate modern',
- 'in style of plastic building bricks',
- 'in style of classic anime from 1990',
- ],
- seed=42,
- if_I_kwargs={
- "guidance_scale": 10.0,
- "sample_timestep_respacing": "10,10,10,10,10,10,10,10,0,0",
- 'support_noise_less_qsample_steps': 5,
- },
- if_II_kwargs={
- "guidance_scale": 4.0,
- "sample_timestep_respacing": 'smart50',
- "support_noise_less_qsample_steps": 5,
- },
-)
-if_I.show(result['II'], 1, 20)
-```
-
-
-
-
-## III. Super Resolution
-For super-resolution, users can run `IF-II` and `IF-III` or 'Stable x4' on an image that was not necessarely generated by IF (two cascades):
-
-```python
-from deepfloyd_if.pipelines import super_resolution
-
-middle_res = super_resolution(
- t5,
- if_III=if_II,
- prompt=['woman with a blue headscarf and a blue sweaterp, detailed picture, 4k dslr, best quality'],
- support_pil_img=raw_pil_image,
- img_scale=4.,
- img_size=64,
- if_III_kwargs={
- 'sample_timestep_respacing': 'smart100',
- 'aug_level': 0.5,
- 'guidance_scale': 6.0,
- },
-)
-high_res = super_resolution(
- t5,
- if_III=if_III,
- prompt=[''],
- support_pil_img=middle_res['III'][0],
- img_scale=4.,
- img_size=256,
- if_III_kwargs={
- "guidance_scale": 9.0,
- "noise_level": 20,
- "sample_timestep_respacing": "75",
- },
-)
-show_superres(raw_pil_image, high_res['III'][0])
-```
-
-
-
-
-### IV. Zero-shot Inpainting
-
-```python
-from deepfloyd_if.pipelines import inpainting
-
-result = inpainting(
- t5=t5, if_I=if_I,
- if_II=if_II,
- if_III=if_III,
- support_pil_img=raw_pil_image,
- inpainting_mask=inpainting_mask,
- prompt=[
- 'oil art, a man in a hat',
- ],
- seed=42,
- if_I_kwargs={
- "guidance_scale": 7.0,
- "sample_timestep_respacing": "10,10,10,10,10,0,0,0,0,0",
- 'support_noise_less_qsample_steps': 0,
- },
- if_II_kwargs={
- "guidance_scale": 4.0,
- 'aug_level': 0.0,
- "sample_timestep_respacing": '100',
- },
- if_III_kwargs={
- "guidance_scale": 9.0,
- "noise_level": 20,
- "sample_timestep_respacing": "75",
- },
-)
-if_I.show(result['I'], 2, 3)
-if_I.show(result['II'], 2, 6)
-if_I.show(result['III'], 2, 14)
-```
-
-
-### 🤗 Model Zoo 🤗
-The link to download the weights as well as the model cards will be available soon on each model of the model zoo
-
-#### Original
-
-| Name | Cascade | Params | FID | Batch size | Steps |
-|:----------------------------------------------------------|:-------:|:------:|:----:|:----------:|:-----:|
-| [IF-I-M](https://huggingface.co/DeepFloyd/IF-I-M-v1.0) | I | 400M | 8.86 | 3072 | 2.5M |
-| [IF-I-L](https://huggingface.co/DeepFloyd/IF-I-L-v1.0) | I | 900M | 8.06 | 3200 | 3.0M |
-| [IF-I-XL](https://huggingface.co/DeepFloyd/IF-I-XL-v1.0)* | I | 4.3B | 6.66 | 3072 | 2.42M |
-| [IF-II-M](https://huggingface.co/DeepFloyd/IF-II-M-v1.0) | II | 450M | - | 1536 | 2.5M |
-| [IF-II-L](https://huggingface.co/DeepFloyd/IF-II-L-v1.0)* | II | 1.2B | - | 1536 | 2.5M |
-| IF-III-L* _(soon)_ | III | 700M | - | 3072 | 1.25M |
-
- *best modules
-
-### Quantitative Evaluation
-
-`FID = 6.66`
-
-
-
-## License
-
-The code in this repository is released under the bespoke license (see added [point two](https://github.com/deep-floyd/IF/blob/main/LICENSE#L13)).
-
-The weights will be available soon via [the DeepFloyd organization at Hugging Face](https://huggingface.co/DeepFloyd) and have their own LICENSE.
-
-**Disclaimer:** *The initial release of the IF model is under a restricted research-purposes-only license temporarily to gather feedback, and after that we intend to release a fully open-source model in line with other Stability AI models.*
-
-## Limitations and Biases
-
-The models available in this codebase have known limitations and biases. Please refer to [the model card](https://huggingface.co/DeepFloyd/IF-I-L-v1.0) for more information.
+## 5.After running webui.py, you will need to wait for some time while the model files are being loaded. Once the loading is complete, the following message will be displayed. Then, you can open your browser and enter http://127.0.0.1:6006 to access the IF-easy-webui interface and start experiencing the IF model
+```bash
+root@xxxx:~/IF-easy-webui# python webui.py
+FORCE_MEM_EFFICIENT_ATTN= 0 @UNET:QKVATTENTION
+/root/miniconda3/lib/python3.8/site-packages/huggingface_hub/file_download.py:791: FutureWarning: The `force_filename` parameter is deprecated as a new caching system, which keeps the filenames as they are on the Hub, is now in place.
+ warnings.warn(
+Loading pipeline components...: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 8.59it/s]
+Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:19<00:00, 9.86s/it]
+Running on local URL: http://127.0.0.1:6006
+To create a public link, set `share=True` in `launch()`.
+```
+
-## 🎓 DeepFloyd IF creators:
+## 6.Friendly Reminder: How to Obtain a Hugging Face Access Token
+To get a Hugging Face Access Token, follow these steps:
-- Alex Shonenkov [GitHub](https://github.com/shonenkov) | [Linktr](https://linktr.ee/shonenkovAI)
-- Misha Konstantinov [GitHub](https://github.com/zeroshot-ai) | [Twitter](https://twitter.com/_bra_ket)
-- Daria Bakshandaeva [GitHub](https://github.com/Gugutse) | [Twitter](https://twitter.com/_gugutse_)
-- Christoph Schuhmann [GitHub](https://github.com/christophschuhmann) | [Twitter](https://twitter.com/laion_ai)
-- Ksenia Ivanova [GitHub](https://github.com/ivksu) | [Twitter](https://twitter.com/susiaiv)
-- Nadiia Klokova [GitHub](https://github.com/vauimpuls) | [Twitter](https://twitter.com/vauimpuls)
+Log in to Hugging Face at https://huggingface.co/
+Click on your profile picture in the top-right corner.
-## 📄 Research Paper (Soon)
+From the dropdown menu, select Access Tokens.
-## Acknowledgements
+On the Access Tokens page, you can generate the token required for logging into huggingface_hub (refer to the screenshot).
+
-Special thanks to [StabilityAI](http://stability.ai) and its CEO [Emad Mostaque](https://twitter.com/emostaque) for invaluable support, providing GPU compute and infrastructure to train the models (our gratitude goes to [Richard Vencu](https://github.com/rvencu)); thanks to [LAION](https://laion.ai) and [Christoph Schuhmann](https://github.com/christophschuhmann) in particular for contribution to the project and well-prepared datasets; thanks to [Huggingface](https://huggingface.co) teams for optimizing models' speed and memory consumption during inference, creating demos and giving cool advice!
+## click here to view Deepfloyd's README [README-deepfloyd.md](README-deepfloyd.md)
-## 🚀 External Contributors 🚀
-- The Biggest Thanks [@Apolinário](https://github.com/apolinario), for ideas, consultations, help and support on all stages to make IF available in open-source; for writing a lot of documentation and instructions; for creating a friendly atmosphere in difficult moments 🦉;
-- Thanks, [@patrickvonplaten](https://github.com/patrickvonplaten), for improving loading time of unet models by 80%;
-for integration Stable-Diffusion-x4 as native pipeline 💪;
-- Thanks, [@williamberman](https://github.com/williamberman) and [@patrickvonplaten](https://github.com/patrickvonplaten) for diffusers integration 🙌;
-- Thanks, [@hysts](https://github.com/hysts) and [@Apolinário](https://github.com/apolinario) for creating [the best gradio demo with IF](https://huggingface.co/spaces/DeepFloyd/IF) 🚀;
-- Thanks, [@Dango233](https://github.com/Dango233), for adapting IF with xformers memory efficient attention 💪;
diff --git a/deepfloyd_if/modules/base.py b/deepfloyd_if/modules/base.py
index c808a3c..defb105 100644
--- a/deepfloyd_if/modules/base.py
+++ b/deepfloyd_if/modules/base.py
@@ -247,9 +247,12 @@ def load_checkpoint(self, model, dir_or_name, filename='pytorch_model.bin'):
def _get_path_or_download_file_from_hf(self, dir_or_name, filename):
if dir_or_name in self.available_models:
cache_dir = os.path.join(self.cache_dir, dir_or_name)
- hf_hub_download(repo_id=f'DeepFloyd/{dir_or_name}', filename=filename, cache_dir=cache_dir,
+ # 20241208: fixed the bug of "No such file or directory: '/root/.cache/IF_/IF-I-XL-v1.0/config.yml'"
+ hf_download_path = hf_hub_download(repo_id=f'DeepFloyd/{dir_or_name}', filename=filename, cache_dir=cache_dir,
force_filename=filename, token=self.hf_token)
- return os.path.join(cache_dir, filename)
+ #return os.path.join(cache_dir, filename)
+
+ return hf_download_path
else:
return os.path.join(dir_or_name, filename)
diff --git a/deepfloyd_if/modules/t5.py b/deepfloyd_if/modules/t5.py
index 0b61529..1ae3dd1 100644
--- a/deepfloyd_if/modules/t5.py
+++ b/deepfloyd_if/modules/t5.py
@@ -65,13 +65,19 @@ def __init__(self, device, dir_or_name='t5-v1_1-xxl', *, cache_dir=None, hf_toke
tokenizer_path, path = dir_or_name, dir_or_name
if dir_or_name in self.available_models:
cache_dir = os.path.join(self.cache_dir, dir_or_name)
+ # 20241209:fixed the bug of "OSError: Can't load tokenizer for '/root/autodl-tmp/transformers-cache/t5-v1_1-xxl'"
+ hf_download_path = ''
for filename in [
'config.json', 'special_tokens_map.json', 'spiece.model', 'tokenizer_config.json',
'pytorch_model.bin.index.json', 'pytorch_model-00001-of-00002.bin', 'pytorch_model-00002-of-00002.bin'
]:
- hf_hub_download(repo_id=f'DeepFloyd/{dir_or_name}', filename=filename, cache_dir=cache_dir,
+ hf_download_path = hf_hub_download(repo_id=f'DeepFloyd/{dir_or_name}', filename=filename, cache_dir=cache_dir,
force_filename=filename, token=self.hf_token)
- tokenizer_path, path = cache_dir, cache_dir
+
+ hf_download_path = os.path.dirname(hf_download_path)
+ #tokenizer_path, path = cache_dir, cache_dir
+
+ tokenizer_path, path = hf_download_path, hf_download_path
else:
cache_dir = os.path.join(self.cache_dir, 't5-v1_1-xxl')
for filename in [
@@ -81,7 +87,7 @@ def __init__(self, device, dir_or_name='t5-v1_1-xxl', *, cache_dir=None, hf_toke
force_filename=filename, token=self.hf_token)
tokenizer_path = cache_dir
- self.tokenizer = AutoTokenizer.from_pretrained(tokenizer_path)
+ self.tokenizer = AutoTokenizer.from_pretrained(tokenizer_path, legacy=False)
self.model = T5EncoderModel.from_pretrained(path, **t5_model_kwargs).eval()
def get_text_embeddings(self, texts):
diff --git a/deepfloyd_if/pipelines/inpainting.py b/deepfloyd_if/pipelines/inpainting.py
index d0a4c78..277a285 100644
--- a/deepfloyd_if/pipelines/inpainting.py
+++ b/deepfloyd_if/pipelines/inpainting.py
@@ -58,7 +58,8 @@ def inpainting(
if_I_kwargs['support_noise'] = low_res
- inpainting_mask_I = img_as_bool(resize(inpainting_mask[0].cpu(), (3, image_h, image_w)))
+ # 20241207 fixed the error 1/3:TypeError: Cannot interpret 'torch.uint8' as a data type
+ inpainting_mask_I = img_as_bool(resize(inpainting_mask[0].cpu().numpy(), (3, image_h, image_w)))
inpainting_mask_I = torch.from_numpy(inpainting_mask_I).unsqueeze(0).to(if_I.device)
if_I_kwargs['inpainting_mask'] = inpainting_mask_I
@@ -81,7 +82,8 @@ def inpainting(
if_II_kwargs['support_noise'] = mid_res
if 'inpainting_mask' not in if_II_kwargs:
- inpainting_mask_II = img_as_bool(resize(inpainting_mask[0].cpu(), (3, image_h, image_w)))
+ # 20241207 fixed the error 2/3:TypeError: Cannot interpret 'torch.uint8' as a data type
+ inpainting_mask_II = img_as_bool(resize(inpainting_mask[0].cpu().numpy(), (3, image_h, image_w)))
inpainting_mask_II = torch.from_numpy(inpainting_mask_II).unsqueeze(0).to(if_II.device)
if_II_kwargs['inpainting_mask'] = inpainting_mask_II
@@ -110,7 +112,8 @@ def inpainting(
if_III_kwargs['support_noise'] = high_res
if 'inpainting_mask' not in if_III_kwargs:
- inpainting_mask_III = img_as_bool(resize(inpainting_mask[0].cpu(), (3, image_h, image_w)))
+ # 20241207 fixed the error 3/3:TypeError: Cannot interpret 'torch.uint8' as a data type
+ inpainting_mask_III = img_as_bool(resize(inpainting_mask[0].cpu().numpy(), (3, image_h, image_w)))
inpainting_mask_III = torch.from_numpy(inpainting_mask_III).unsqueeze(0).to(if_III.device)
if_III_kwargs['inpainting_mask'] = inpainting_mask_III
diff --git a/examples/01.dream-usage.py b/examples/01.dream-usage.py
new file mode 100644
index 0000000..6cdcff3
--- /dev/null
+++ b/examples/01.dream-usage.py
@@ -0,0 +1,46 @@
+import sys
+import os
+sys.path.append(os.getcwd())
+from deepfloyd_if.modules import IFStageI, IFStageII, StableStageIII
+from deepfloyd_if.modules.t5 import T5Embedder
+
+device = 'cuda:0'
+# If the default space is insufficient, set a custom cache location for the model
+#config_path = '/root/autodl-tmp/transformers-cache/'
+config_path = None
+
+# Initialize the model
+if_I = IFStageI('IF-I-XL-v1.0', device=device, cache_dir=config_path)
+if_II = IFStageII('IF-II-L-v1.0', device=device, cache_dir=config_path)
+if_III = StableStageIII('stable-diffusion-x4-upscaler', device=device, cache_dir=config_path)
+t5 = T5Embedder(device="cpu", cache_dir=config_path)
+
+from deepfloyd_if.pipelines import dream
+
+prompt = 'ultra close-up color photo portrait of rainbow owl with deer horns in the woods'
+count = 4
+
+result = dream(
+ t5=t5, if_I=if_I, if_II=if_II, #if_III=if_III,
+ prompt=[prompt]*count,
+ seed=42,
+ if_I_kwargs={
+ "guidance_scale": 7.0,
+ "sample_timestep_respacing": "smart100",
+ },
+ if_II_kwargs={
+ "guidance_scale": 4.0,
+ "sample_timestep_respacing": "smart50",
+ },
+# if_III_kwargs={
+# "guidance_scale": 9.0,
+# "noise_level": 20,
+# "sample_timestep_respacing": "75",
+# },
+)
+
+# if_III.show(result['III'], size=14)
+# save the generated images as png files
+for stage, images in result.items():
+ for i, image in enumerate(images):
+ image.save(f'generate-imgs/01.dream-usage_{stage}_{i}.png')
\ No newline at end of file
diff --git a/examples/02.zero-short-image-to-image.py b/examples/02.zero-short-image-to-image.py
new file mode 100644
index 0000000..f3d65d6
--- /dev/null
+++ b/examples/02.zero-short-image-to-image.py
@@ -0,0 +1,55 @@
+import sys
+import os
+sys.path.append(os.getcwd())
+from deepfloyd_if.modules import IFStageI, IFStageII, StableStageIII
+from deepfloyd_if.modules.t5 import T5Embedder
+from PIL import Image
+
+# If the default space is insufficient, set a custom cache location for the model
+#config_path = '/root/autodl-tmp/transformers-cache/'
+config_path = None
+
+device = 'cuda:0'
+
+# Initialize the model
+device = 'cuda:0'
+if_I = IFStageI('IF-I-XL-v1.0', device=device, cache_dir=config_path)
+if_II = IFStageII('IF-II-L-v1.0', device=device, cache_dir=config_path)
+if_III = StableStageIII('stable-diffusion-x4-upscaler', device=device, cache_dir=config_path)
+t5 = T5Embedder(device="cpu", cache_dir=config_path)
+
+from deepfloyd_if.pipelines import style_transfer
+
+# Load the original image
+raw_pil_image = Image.open('raw-image/raw-image-01.png')
+result = style_transfer(
+ t5=t5, if_I=if_I, if_II=if_II,
+ support_pil_img=raw_pil_image,
+ style_prompt=[
+ 'in style of professional origami',
+ 'in style of oil art, Tate modern',
+ 'in style of plastic building bricks',
+ 'in style of classic anime from 1990',
+ ],
+ seed=42,
+ if_I_kwargs={
+ "guidance_scale": 10.0,
+ "sample_timestep_respacing": "10,10,10,10,10,10,10,10,0,0",
+ 'support_noise_less_qsample_steps': 5,
+ },
+ if_II_kwargs={
+ "guidance_scale": 4.0,
+ "sample_timestep_respacing": 'smart50',
+ "support_noise_less_qsample_steps": 5,
+ },
+)
+
+#print(result)
+
+# save the generated images as png files
+for stage, images in result.items():
+ for i, img in enumerate(images):
+ img.save(f'generate-imgs/02.zero-short-image2image_tran_{stage}_{i}_2.png')
+
+
+
\ No newline at end of file
diff --git a/examples/03.super-resolution.py b/examples/03.super-resolution.py
new file mode 100644
index 0000000..2a90132
--- /dev/null
+++ b/examples/03.super-resolution.py
@@ -0,0 +1,67 @@
+import sys
+import os
+sys.path.append(os.getcwd())
+from deepfloyd_if.modules import IFStageI, IFStageII, StableStageIII
+from deepfloyd_if.modules.t5 import T5Embedder
+from PIL import Image
+
+# If the default space is insufficient, set a custom cache location for the model
+#config_path = '/root/autodl-tmp/transformers-cache/'
+config_path = None
+
+device = 'cuda:0'
+
+# Initialize the model
+device = 'cuda:0'
+if_I = IFStageI('IF-I-XL-v1.0', device=device, cache_dir=config_path)
+if_II = IFStageII('IF-II-L-v1.0', device=device, cache_dir=config_path)
+if_III = StableStageIII('stable-diffusion-x4-upscaler', device=device, cache_dir=config_path)
+t5 = T5Embedder(device="cpu", cache_dir=config_path)
+
+from deepfloyd_if.pipelines import super_resolution
+
+# Load the original image
+raw_pil_image = Image.open('raw-image/raw-image-02.png')
+
+
+middle_res = super_resolution(
+ t5,
+ if_III=if_II,
+ prompt=['woman with a blue headscarf and a blue sweaterp, detailed picture, 4k dslr, best quality'],
+ support_pil_img=raw_pil_image,
+ img_scale=4.,
+ img_size=64,
+ if_III_kwargs={
+ 'sample_timestep_respacing': 'smart100',
+ 'aug_level': 0.5,
+ 'guidance_scale': 6.0,
+ },
+)
+high_res = super_resolution(
+ t5,
+ if_III=if_III,
+ prompt=[''],
+ support_pil_img=middle_res['III'][0],
+ img_scale=4.,
+ img_size=256,
+ if_III_kwargs={
+ "guidance_scale": 9.0,
+ "noise_level": 20,
+ "sample_timestep_respacing": "75",
+ },
+)
+#show_superres(raw_pil_image, high_res['III'][0])
+
+#print(result)
+
+# save the generated images as png files
+for stage, images in middle_res.items():
+ for i, img in enumerate(images):
+ img.save(f'generate-imgs/06.super_middle_resolution_{stage}_{i}-3.png')
+
+for stage, images in high_res.items():
+ for i, img in enumerate(images):
+ img.save(f'generate-imgs/03.super_high_resolution_{stage}_{i}-3.png')
+
+
+
\ No newline at end of file
diff --git a/examples/04.zero-short-inpainting.py b/examples/04.zero-short-inpainting.py
new file mode 100644
index 0000000..be2c395
--- /dev/null
+++ b/examples/04.zero-short-inpainting.py
@@ -0,0 +1,69 @@
+import sys
+import os
+sys.path.append(os.getcwd())
+from deepfloyd_if.modules import IFStageI, IFStageII, StableStageIII
+from deepfloyd_if.modules.t5 import T5Embedder
+from deepfloyd_if.pipelines import inpainting
+from PIL import Image
+import numpy as np
+import torch
+
+# If the default space is insufficient, set a custom cache location for the model
+#config_path = '/root/autodl-tmp/transformers-cache/'
+config_path = None
+
+device = 'cuda:0'
+
+# Initialize the model
+if_I = IFStageI('IF-I-XL-v1.0', device=device, cache_dir=config_path)
+if_II = IFStageII('IF-II-L-v1.0', device=device, cache_dir=config_path)
+if_III = StableStageIII('stable-diffusion-x4-upscaler', device=device, cache_dir=config_path)
+t5 = T5Embedder(device="cpu", cache_dir=config_path)
+
+# Load the original image
+raw_pil_image = Image.open('raw-image/raw-image-03.png').convert('RGB')
+
+img = np.array(raw_pil_image)
+img = img.astype(np.float32) / 127.5 - 1
+img = np.transpose(img, [2, 0, 1])
+img = torch.from_numpy(img).unsqueeze(0)
+
+# 创建 inpainting_mask
+inpainting_mask = torch.zeros_like(img[0], device='cpu')
+inpainting_mask[:, 0:210, 210:660] = 1
+inpainting_mask = inpainting_mask.unsqueeze(0)
+
+if_I.to_images((1-inpainting_mask)*img)[0].save(f'generate-imgs/04.oil-man_inpainting-mask-4.png')
+print("inpainting_mask.shape:", inpainting_mask.shape)
+
+result = inpainting(
+ t5=t5, if_I=if_I,
+ if_II=if_II,
+ if_III=if_III,
+ support_pil_img=raw_pil_image,
+ inpainting_mask=inpainting_mask,
+ prompt=[
+ 'detailed picture, 4k dslr, best quality, a man in a black hat',
+ ],
+ seed=42,
+ if_I_kwargs={
+ "guidance_scale": 7.0,
+ "sample_timestep_respacing": "10,10,10,10,10,0,0,0,0,0",
+ 'support_noise_less_qsample_steps': 0,
+ },
+ if_II_kwargs={
+ "guidance_scale": 4.0,
+ 'aug_level': 0.0,
+ "sample_timestep_respacing": '100',
+ },
+ if_III_kwargs={
+ "guidance_scale": 9.0,
+ "noise_level": 20,
+ "sample_timestep_respacing": "75",
+ },
+)
+
+# save the generated images as png files
+for stage, images in result.items():
+ for i, img in enumerate(images):
+ img.save(f'generate-imgs/04.zero-short-inpainting_{stage}_{i}-4.png')
\ No newline at end of file
diff --git a/examples/05.glass-inpainting.py b/examples/05.glass-inpainting.py
new file mode 100644
index 0000000..482ca9c
--- /dev/null
+++ b/examples/05.glass-inpainting.py
@@ -0,0 +1,76 @@
+import sys
+import os
+sys.path.append(os.getcwd())
+from deepfloyd_if.modules import IFStageI, IFStageII, StableStageIII
+from deepfloyd_if.modules.t5 import T5Embedder
+from deepfloyd_if.pipelines import inpainting
+from PIL import Image
+import numpy as np
+import torch
+
+# If the default space is insufficient, set a custom cache location for the model
+#config_path = '/root/autodl-tmp/transformers-cache/'
+config_path = None
+
+device = 'cuda:0'
+
+# Initialize the model
+if_I = IFStageI('IF-I-XL-v1.0', device=device, cache_dir=config_path)
+if_II = IFStageII('IF-II-L-v1.0', device=device, cache_dir=config_path)
+if_III = StableStageIII('stable-diffusion-x4-upscaler', device=device, cache_dir=config_path)
+t5 = T5Embedder(device="cpu", cache_dir=config_path)
+
+# Load the original image
+raw_pil_image = Image.open('raw-image/raw-image-04.png').convert('RGB').resize((1024, 1024))
+
+pil_image = raw_pil_image.resize(
+ (64, 64), resample=Image.Resampling.BICUBIC, reducing_gap=None
+)
+img = np.array(pil_image)
+img = img.astype(np.float32) / 127.5 - 1
+img = np.transpose(img, [2, 0, 1])
+img = torch.from_numpy(img).unsqueeze(0)
+
+if_I.to_images(img)[0].save(f'generate-imgs/05.glass_inpainting_64*64.png')
+
+inpainting_mask = torch.zeros_like(img[0], device='cpu')
+inpainting_mask[:, 26:36, 24:34] = 1
+#inpainting_mask[:, 29:33, 34:36] = 1
+#inpainting_mask[:, 26:36, 36:44] = 1
+inpainting_mask = inpainting_mask.unsqueeze(0)
+
+if_I.to_images((1-inpainting_mask)*img)[0].save(f'generate-imgs/05.glass_inpainting_64*64-mask-1.png')
+print("inpainting_mask.shape:", inpainting_mask.shape)
+
+# 将 inpainting_mask 转换为 NumPy 数组
+#inpainting_mask_np = inpainting_mask.cpu().numpy()
+
+result = inpainting(
+ t5=t5, if_I=if_I,
+ if_II=if_II,
+ if_III=if_III,
+ support_pil_img=raw_pil_image,
+ inpainting_mask=inpainting_mask,
+ prompt=[
+ 'blue sunglasses',
+ 'yellow sunglasses',
+ 'red sunglasses',
+ 'green sunglasses',
+ ],
+ seed=42,
+ if_I_kwargs={
+ "guidance_scale": 7.0,
+ "sample_timestep_respacing": "10,10,10,10,10,0,0,0,0,0",
+ 'support_noise_less_qsample_steps': 0,
+ },
+ if_II_kwargs={
+ "guidance_scale": 4.0,
+ 'aug_level': 0.0,
+ "sample_timestep_respacing": '100',
+ },
+)
+
+# save the generated images as png files
+for stage, images in result.items():
+ for i, img in enumerate(images):
+ img.save(f'generate-imgs/05.glass_inpainting_{stage}_{i}.png')
\ No newline at end of file
diff --git a/examples/06.style-transfer-double-prompt.py b/examples/06.style-transfer-double-prompt.py
new file mode 100644
index 0000000..ce8394d
--- /dev/null
+++ b/examples/06.style-transfer-double-prompt.py
@@ -0,0 +1,64 @@
+import sys
+import os
+sys.path.append(os.getcwd())
+from deepfloyd_if.modules import IFStageI, IFStageII, StableStageIII
+from deepfloyd_if.modules.t5 import T5Embedder
+from PIL import Image
+from deepfloyd_if.pipelines import style_transfer
+
+# If the default space is insufficient, set a custom cache location for the model
+#config_path = '/root/autodl-tmp/transformers-cache/'
+config_path = None
+
+device = 'cuda:0'
+
+# Initialize the model
+device = 'cuda:0'
+if_I = IFStageI('IF-I-XL-v1.0', device=device, cache_dir=config_path)
+if_II = IFStageII('IF-II-L-v1.0', device=device, cache_dir=config_path)
+if_III = StableStageIII('stable-diffusion-x4-upscaler', device=device, cache_dir=config_path)
+t5 = T5Embedder(device="cpu", cache_dir=config_path)
+
+# Load the original image
+raw_pil_image = Image.open('raw-image/raw-image-05.png').convert('RGB')
+
+
+count = 4
+prompt = 'white cat'
+
+result = style_transfer(
+ t5=t5, if_I=if_I, if_II=if_II, if_III=if_III,
+ support_pil_img=raw_pil_image,
+ prompt=[prompt]*count,
+ style_prompt=[
+ f'in style lego',
+ f'in style zombie',
+ f'in style origami',
+ f'in style anime',
+ ],
+ seed=42,
+ if_I_kwargs={
+ "guidance_scale": 10.0,
+ "sample_timestep_respacing": "10,10,10,10,10,0,0,0,0,0",
+ 'support_noise_less_qsample_steps': 5,
+ 'positive_mixer': 0.8,
+ },
+ if_II_kwargs={
+ "guidance_scale": 4.0,
+ "sample_timestep_respacing": 'smart50',
+ "support_noise_less_qsample_steps": 5,
+ 'positive_mixer': 1.0,
+ },
+)
+#if_I.show(result['III'], 2, 14)
+
+#print(result)
+
+# save the generated images as png files
+for stage, images in result.items():
+ for i, img in enumerate(images):
+ img.save(f'generate-imgs/06.style-transfer-double-prompt_{stage}_{i}.png')
+
+
+
+
\ No newline at end of file
diff --git a/generate-imgs/01.dream-usage_II_0.png b/generate-imgs/01.dream-usage_II_0.png
new file mode 100644
index 0000000..7e3e24e
Binary files /dev/null and b/generate-imgs/01.dream-usage_II_0.png differ
diff --git a/generate-imgs/01.dream-usage_II_1.png b/generate-imgs/01.dream-usage_II_1.png
new file mode 100644
index 0000000..0c4b4d5
Binary files /dev/null and b/generate-imgs/01.dream-usage_II_1.png differ
diff --git a/generate-imgs/01.dream-usage_II_2.png b/generate-imgs/01.dream-usage_II_2.png
new file mode 100644
index 0000000..f245b5f
Binary files /dev/null and b/generate-imgs/01.dream-usage_II_2.png differ
diff --git a/generate-imgs/01.dream-usage_II_3.png b/generate-imgs/01.dream-usage_II_3.png
new file mode 100644
index 0000000..600d401
Binary files /dev/null and b/generate-imgs/01.dream-usage_II_3.png differ
diff --git a/pics/huggingface-accesstokens.png b/pics/huggingface-accesstokens.png
new file mode 100644
index 0000000..88d3127
Binary files /dev/null and b/pics/huggingface-accesstokens.png differ
diff --git a/pics/webui.png b/pics/webui.png
new file mode 100644
index 0000000..f9d6c35
Binary files /dev/null and b/pics/webui.png differ
diff --git a/raw-image/raw-image-00.png b/raw-image/raw-image-00.png
new file mode 100644
index 0000000..a03529a
Binary files /dev/null and b/raw-image/raw-image-00.png differ
diff --git a/raw-image/raw-image-01.png b/raw-image/raw-image-01.png
new file mode 100644
index 0000000..f2685a7
Binary files /dev/null and b/raw-image/raw-image-01.png differ
diff --git a/raw-image/raw-image-02.png b/raw-image/raw-image-02.png
new file mode 100644
index 0000000..dac93d0
Binary files /dev/null and b/raw-image/raw-image-02.png differ
diff --git a/raw-image/raw-image-03.png b/raw-image/raw-image-03.png
new file mode 100644
index 0000000..9fc688a
Binary files /dev/null and b/raw-image/raw-image-03.png differ
diff --git a/raw-image/raw-image-04.png b/raw-image/raw-image-04.png
new file mode 100644
index 0000000..82e8322
Binary files /dev/null and b/raw-image/raw-image-04.png differ
diff --git a/raw-image/raw-image-05.png b/raw-image/raw-image-05.png
new file mode 100644
index 0000000..971a59e
Binary files /dev/null and b/raw-image/raw-image-05.png differ
diff --git a/raw-image/raw-image-06.png b/raw-image/raw-image-06.png
new file mode 100644
index 0000000..8bdb093
Binary files /dev/null and b/raw-image/raw-image-06.png differ
diff --git a/requirements.txt b/requirements.txt
index 8fd0cbb..c023545 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -1,15 +1,17 @@
-tqdm
-numpy
-torch<2.0.0
-torchvision
-omegaconf
-matplotlib
-Pillow>=9.2.0
-huggingface_hub>=0.13.2
-transformers~=4.25.1
-accelerate~=0.15.0
-diffusers~=0.16.0
-tokenizers~=0.13.2
-sentencepiece~=0.1.97
-ftfy~=6.1.1
-beautifulsoup4~=4.11.1
+tqdm==4.61.2
+numpy==1.24.2
+torch==2.4.1
+torchvision==0.14.1
+omegaconf==2.3.0
+matplotlib==3.7.1
+Pillow==9.4.0
+huggingface_hub==0.26.5
+transformers==4.46.3
+accelerate==1.0.1
+diffusers==0.31.0
+tokenizers==0.20.3
+sentencepiece==0.1.99
+ftfy==6.1.3
+beautifulsoup4==4.11.2
+protobuf==3.19.6
+gradio==4.44.1
\ No newline at end of file
diff --git a/webui.py b/webui.py
new file mode 100644
index 0000000..d3b7ab8
--- /dev/null
+++ b/webui.py
@@ -0,0 +1,63 @@
+import gradio as gr
+from deepfloyd_if.modules import IFStageI, IFStageII, StableStageIII
+from deepfloyd_if.modules.t5 import T5Embedder
+from deepfloyd_if.pipelines import dream
+
+
+# If the default space is insufficient, set a custom cache location for the model
+#config_path = '/root/autodl-tmp/transformers-cache/'
+config_path = None
+device = 'cuda:0'
+
+# Initialize the model
+if_I = IFStageI('IF-I-XL-v1.0', device=device, cache_dir=config_path)
+if_II = IFStageII('IF-II-L-v1.0', device=device, cache_dir=config_path)
+if_III = StableStageIII('stable-diffusion-x4-upscaler', device=device, cache_dir=config_path)
+t5 = T5Embedder(device="cpu", cache_dir=config_path)
+
+# Define a function to generate images
+def generate_images(prompt, count, seed):
+ result = dream(
+ t5=t5, if_I=if_I, if_II=if_II, if_III=if_III,
+ prompt=[prompt] * count,
+ seed=seed,
+ if_I_kwargs={
+ "guidance_scale": 7.0,
+ "sample_timestep_respacing": "smart100",
+ },
+ if_II_kwargs={
+ "guidance_scale": 4.0,
+ "sample_timestep_respacing": "smart50",
+ },
+ if_III_kwargs={
+ "guidance_scale": 9.0,
+ "noise_level": 20,
+ "sample_timestep_respacing": "75",
+ },
+ )
+
+ # Return the generated image object directly
+ images = []
+ for stage, stage_images in result.items():
+ #if stage != 'II':
+ # continue
+ for image in stage_images:
+ images.append(image)
+
+ return images
+
+# Create a Gradio interface
+iface = gr.Interface(
+ fn=generate_images,
+ inputs=[
+ gr.Textbox(label="Prompt", placeholder="Enter your prompt here"),
+ gr.Slider(label="Count", minimum=1, maximum=10, value=4),
+ gr.Number(label="Seed", value=42)
+ ],
+ outputs=gr.Gallery(label="Generated Images"),
+ title="DeepFloyd IF Image Generator",
+ description="Generate images based on your prompt using DeepFloyd IF model.",
+)
+
+# Specify the IP and port for startup
+iface.launch(server_name="127.0.0.1", server_port=6006)
\ No newline at end of file