This repo allows you to perform image editing via text prompts, or support user-defined masking area and generation prompts.
Tip: modify the paths in VLinference.py, masking_inpainting.py, and app.py to point to your local model directories.
- Text-guided image inpainting
- User-defined masking area
- Complete WebUI support
- Train the clip model for mask prediction
- Design a metric for mask prediction quality (Sciprt Finished)
- Do ablation study (Sciprt Finished)
- Finish the report (With basic structure given)
You may download the Magic Brush dataset using the the codes in dataset/download.py, and extract the dataset using dataset/extract.py. The dataset will be formatted into:
MagicBrush/
├── train
│ ├── 00000
│ │ ├── instructions.txt (with inpaint instructions)
│ │ ├── mask_img.png (img with masked out areas)
│ │ ├── source_img.png (original image)
│ │ └── target_img.png (image with masked areas filled)
│ └── ...
├── test
│ └── ...
First of all, set up a new conda environment:
conda create -n paint python=3.10 -y
conda activate paintThen install the required packages:
pip install torch==2.6.0
pip install torchvision==0.21.0
pip install -r requirements.txtI advise that you also install flash-attn to speed up inference. If you installed all prior dependencies following this guide, then you can choose to install via this method:
wget https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.1.post4/flash_attn-2.7.1.post4+cu12torch2.6cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
pip install flash_attn-2.7.1.post4+cu12torch2.6cxx11abiFALSE-cp310-cp310-linux_x86_64.whlotherwise, you can install via pip, which might be slow and cause trouble:
pip install flash-attnif you really have trouble installing flash-attn, simply disable it in VLinference.py.
Then we install CLIP module:
cd CLIP
pip install -e .
cd ..In the current impl, I used the stable-diffusion-2-inpainting model, also with ViT-B 16 and some weights for SlipSeg. For the large model, I recommend using modelscope for downloading:
pip install modelscope # If you do not have modelscope installed
modelscope download --model stabilityai/stable-diffusion-2-inpainting --local_dir <your local dir>
modelscope download --model Qwen/Qwen2.5-VL-7B-Instruct --local_dir <your local dir>To download the weights for SlipSeg, you can use the following command:
cd clipseg
wget https://owncloud.gwdg.de/index.php/s/ioHbRzFx6th32hn/download -O weights.zip
unzip -d weights -j weights.zip
cd ..Then in masking_inpainting.py, you need to set the diffusion model load path on line 126, where I defined:
pipe = StableDiffusionInpaintPipeline.from_pretrained(
'/nvme0n1/xmy/stable-diffusion-2-inpainting', # Change this to your own local dir
revision='fp16',
torch_dtype=torch.float16,
use_safetensors=True
).to(device)Also, in VLinference.py, you need to set the model path on line 6:
model_path="/nvme0n1/xmy/Qwen2.5-VL-7B-Instruct" # Change this to your own local dirThe small ViT-B 16 model will be automatically downloaded when you run the code from HuggingFace. If you meet the error of unable to connect to huggingface, try this:
export HF_ENDPOINT=https://hf-mirror.com
python masking_inpainting.pyYou can modify the inference_painting.py file to set your own image path, mask path, and text prompt. See main function for details. Happy trying!
To reproduce the ablation experiments, use ablation_study.py. The script
generates results for the full pipeline and the three ablated variants
described in the paper:
python ablation_study.py --image <img> --mask_prompt "object" --inpaint_prompt "replacement"Results are written to ablation_results/ by default and PSNR scores relative
to the baseline are printed.
Checkout the instruct-pix2pix folder for newly trained models for inpainting!
