Text-Guided Image Inpainting

This repo allows you to perform image editing via text prompts, or support user-defined masking area and generation prompts.

Tip: modify the paths in VLinference.py, masking_inpainting.py, and app.py to point to your local model directories.

TaskList

Text-guided image inpainting
User-defined masking area
Complete WebUI support
Train the clip model for mask prediction
Design a metric for mask prediction quality (Sciprt Finished)
Do ablation study (Sciprt Finished)
Finish the report (With basic structure given)

You may download the Magic Brush dataset using the the codes in dataset/download.py, and extract the dataset using dataset/extract.py. The dataset will be formatted into:

MagicBrush/
├── train
│   ├── 00000
│   │   ├── instructions.txt (with inpaint instructions)
│   │   ├── mask_img.png (img with masked out areas)
│   │   ├── source_img.png (original image)
│   │   └── target_img.png (image with masked areas filled)
│   └── ...
├── test
│   └── ...

Environment Setup

First of all, set up a new conda environment:

conda create -n paint python=3.10 -y
conda activate paint

Then install the required packages:

pip install torch==2.6.0
pip install torchvision==0.21.0
pip install -r requirements.txt

I advise that you also install flash-attn to speed up inference. If you installed all prior dependencies following this guide, then you can choose to install via this method:

wget https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.1.post4/flash_attn-2.7.1.post4+cu12torch2.6cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
pip install flash_attn-2.7.1.post4+cu12torch2.6cxx11abiFALSE-cp310-cp310-linux_x86_64.whl

otherwise, you can install via pip, which might be slow and cause trouble:

pip install flash-attn

if you really have trouble installing flash-attn, simply disable it in VLinference.py.

Then we install CLIP module:

cd CLIP
pip install -e .
cd ..

Model and Weights

In the current impl, I used the stable-diffusion-2-inpainting model, also with ViT-B 16 and some weights for SlipSeg. For the large model, I recommend using modelscope for downloading:

pip install modelscope # If you do not have modelscope installed
modelscope download --model stabilityai/stable-diffusion-2-inpainting --local_dir <your local dir>
modelscope download --model Qwen/Qwen2.5-VL-7B-Instruct --local_dir <your local dir>

To download the weights for SlipSeg, you can use the following command:

cd clipseg
wget https://owncloud.gwdg.de/index.php/s/ioHbRzFx6th32hn/download -O weights.zip
unzip -d weights -j weights.zip
cd ..

Then in masking_inpainting.py, you need to set the diffusion model load path on line 126, where I defined:

pipe = StableDiffusionInpaintPipeline.from_pretrained(
    '/nvme0n1/xmy/stable-diffusion-2-inpainting', # Change this to your own local dir
    revision='fp16', 
    torch_dtype=torch.float16,
    use_safetensors=True
).to(device)

Also, in VLinference.py, you need to set the model path on line 6:

model_path="/nvme0n1/xmy/Qwen2.5-VL-7B-Instruct" # Change this to your own local dir

The small ViT-B 16 model will be automatically downloaded when you run the code from HuggingFace. If you meet the error of unable to connect to huggingface, try this:

export HF_ENDPOINT=https://hf-mirror.com
python masking_inpainting.py

Running

You can modify the inference_painting.py file to set your own image path, mask path, and text prompt. See main function for details. Happy trying!

Ablation Study

To reproduce the ablation experiments, use ablation_study.py. The script generates results for the full pipeline and the three ablated variants described in the paper:

python ablation_study.py --image <img> --mask_prompt "object" --inpaint_prompt "replacement"

Results are written to ablation_results/ by default and PSNR scores relative to the baseline are printed.

More

Checkout the instruct-pix2pix folder for newly trained models for inpainting!

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
CLIP		CLIP
clipseg		clipseg
dataset		dataset
images		images
siglip		siglip
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
VLinference.py		VLinference.py
ablation_study.py		ablation_study.py
app.py		app.py
demo.png		demo.png
evaluate.py		evaluate.py
inference_painting.py		inference_painting.py
masking_inpainting.py		masking_inpainting.py
output_image.png		output_image.png
output_image_refined.png		output_image_refined.png
pix2pix.bash		pix2pix.bash
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text-Guided Image Inpainting

TaskList

Environment Setup

Model and Weights

Running

Ablation Study

More

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Text-Guided Image Inpainting

TaskList

Environment Setup

Model and Weights

Running

Ablation Study

More

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages