Skip to content

EnderXie23/Paint

Repository files navigation

Text-Guided Image Inpainting

This repo allows you to perform image editing via text prompts, or support user-defined masking area and generation prompts.

Visual Demo

Tip: modify the paths in VLinference.py, masking_inpainting.py, and app.py to point to your local model directories.

TaskList

  • Text-guided image inpainting
  • User-defined masking area
  • Complete WebUI support
  • Train the clip model for mask prediction
  • Design a metric for mask prediction quality (Sciprt Finished)
  • Do ablation study (Sciprt Finished)
  • Finish the report (With basic structure given)

You may download the Magic Brush dataset using the the codes in dataset/download.py, and extract the dataset using dataset/extract.py. The dataset will be formatted into:

MagicBrush/
├── train
│   ├── 00000
│   │   ├── instructions.txt (with inpaint instructions)
│   │   ├── mask_img.png (img with masked out areas)
│   │   ├── source_img.png (original image)
│   │   └── target_img.png (image with masked areas filled)
│   └── ...
├── test
│   └── ...

Environment Setup

First of all, set up a new conda environment:

conda create -n paint python=3.10 -y
conda activate paint

Then install the required packages:

pip install torch==2.6.0
pip install torchvision==0.21.0
pip install -r requirements.txt

I advise that you also install flash-attn to speed up inference. If you installed all prior dependencies following this guide, then you can choose to install via this method:

wget https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.1.post4/flash_attn-2.7.1.post4+cu12torch2.6cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
pip install flash_attn-2.7.1.post4+cu12torch2.6cxx11abiFALSE-cp310-cp310-linux_x86_64.whl

otherwise, you can install via pip, which might be slow and cause trouble:

pip install flash-attn

if you really have trouble installing flash-attn, simply disable it in VLinference.py.

Then we install CLIP module:

cd CLIP
pip install -e .
cd ..

Model and Weights

In the current impl, I used the stable-diffusion-2-inpainting model, also with ViT-B 16 and some weights for SlipSeg. For the large model, I recommend using modelscope for downloading:

pip install modelscope # If you do not have modelscope installed
modelscope download --model stabilityai/stable-diffusion-2-inpainting --local_dir <your local dir>
modelscope download --model Qwen/Qwen2.5-VL-7B-Instruct --local_dir <your local dir>

To download the weights for SlipSeg, you can use the following command:

cd clipseg
wget https://owncloud.gwdg.de/index.php/s/ioHbRzFx6th32hn/download -O weights.zip
unzip -d weights -j weights.zip
cd ..

Then in masking_inpainting.py, you need to set the diffusion model load path on line 126, where I defined:

pipe = StableDiffusionInpaintPipeline.from_pretrained(
    '/nvme0n1/xmy/stable-diffusion-2-inpainting', # Change this to your own local dir
    revision='fp16', 
    torch_dtype=torch.float16,
    use_safetensors=True
).to(device)

Also, in VLinference.py, you need to set the model path on line 6:

model_path="/nvme0n1/xmy/Qwen2.5-VL-7B-Instruct" # Change this to your own local dir

The small ViT-B 16 model will be automatically downloaded when you run the code from HuggingFace. If you meet the error of unable to connect to huggingface, try this:

export HF_ENDPOINT=https://hf-mirror.com
python masking_inpainting.py

Running

You can modify the inference_painting.py file to set your own image path, mask path, and text prompt. See main function for details. Happy trying!

Ablation Study

To reproduce the ablation experiments, use ablation_study.py. The script generates results for the full pipeline and the three ablated variants described in the paper:

python ablation_study.py --image <img> --mask_prompt "object" --inpaint_prompt "replacement"

Results are written to ablation_results/ by default and PSNR scores relative to the baseline are printed.

More

Checkout the instruct-pix2pix folder for newly trained models for inpainting!

About

Multi-Modal ML course project: An image inpainting pipeline that supports text prompts for image editing!

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors