Dataset and source code for Detecting Human Artifacts from Text-to-Image Models.
We setup the environment following EVA-02-det.
conda create --name hadm python=3.8 -y
conda activate hadm
pip install torch==1.12.1+cu116 torchvision==0.13.1+cu116 --extra-index-url https://download.pytorch.org/whl/cu116
pip install cryptography
pip install -r requirements.txt
pip install -v -U git+https://github.com/facebookresearch/xformers.git@v0.0.18#egg=xformers
pip install mmcv==1.7.1 openmim
mim install mmcv-full
python -m pip install -e .We provide the dataset used in the paper. The dataset is available in the following link: HADM Dataset.
The structure of the dataset should look like:
|-- annotations
| |-- train_ALL
| |-- val_ALL
| |-- val_dalle2
| |-- val_dalle3
| |-- val_mj
| `-- val_sdxl
|-- images
| |-- train_ALL
| |-- val_ALL
| |-- val_dalle2
| |-- val_dalle3
| |-- val_mj
| `-- val_sdxl
`-- info.pklNote that we provide the validation set for each domain for the convenience of the evaluation, and val_ALL is the combination of all validation sets. The info.pkl file contains the information of the dataset, including the image filename and the corresponding propmt for generating the image.
Finally, set the environment variable for the dataset path:
export DETECTRON2_DATASETS=datasetsAfter downloading our Human Artifact Dataset, please place it under the datasets directory. Then, download the training images from the follwoing real datasets: LV-MHP-v1, OCHuman, CrowdHuman, HCD, Facial Descriptors. We also filtered COCO with ViTPose and find the images with human presence, and the filtered COCO images are available here.
After downloading these datasets, please place them under the datasets/human_artifact_dataset/images directory. The structure of the dataset should look like:
datasets/human_artifact_dataset/images/
|-- coco_train2017_human
|-- CrowdHuman
|-- facial_descriptors_dataset_images
|-- HCDDataset_images
|-- LV-MHP-v1-images
|-- OCHuman
|-- train_ALL
|-- val_ALL
|-- val_dalle2
|-- val_dalle3
|-- val_mj
`-- val_sdxlAlso, generate the corresponding empty annotation files under datasets/human_artifact_dataset/annotations for the training images from the real datasets by running the following command:
python datasets/generate_empty_anno.py --data_root datasets/human_artifact_dataset/images/coco_train2017_human
python datasets/generate_empty_anno.py --data_root datasets/human_artifact_dataset/images/CrowdHuman
python datasets/generate_empty_anno.py --data_root datasets/human_artifact_dataset/images/facial_descriptors_dataset_images
python datasets/generate_empty_anno.py --data_root datasets/human_artifact_dataset/images/HCDDataset_images
python datasets/generate_empty_anno.py --data_root datasets/human_artifact_dataset/images/LV-MHP-v1-images
python datasets/generate_empty_anno.py --data_root datasets/human_artifact_dataset/images/OCHumanMake sure to download the pretrained weights for EVA-02-L from EVA-02-det and place them under the pretrained_models directory. The pretrained weights can be downloaded from here.
We provide the pretrained weights for the Local Human Artifact Detection Model (HADM-L) and Global Human Artifact Detection Model (HADM-G) models to reproduce the results presented in the paper. The pretrained weights can be downloaded from the following links:
Also make sure to place the pretrained weights under the pretrained_models directory.
Please note that our models take JPEG images as input, so make sure to make the necessary conversions.
Inference HADM-L on arbitrary input images under demo/images.
python tools/lazyconfig_train_net.py --num-gpus 1 --inference \
--config-file projects/ViTDet/configs/eva2_o365_to_coco/demo_local.py \
train.output_dir=./outputs/demo_local \
train.init_checkpoint=pretrained_models/HADM-L_0249999.pth \
dataloader.train.total_batch_size=1 \
train.model_ema.enabled=True \
train.model_ema.use_ema_weights_for_eval_only=True \
inference.input_dir=demo/images \
inference.output_dir=demo/outputs/result_localResults will be saved under demo/outputs/result_local.
Inference HADM-G on arbitrary input images under demo/images.
python tools/lazyconfig_train_net.py --num-gpus 1 --inference \
--config-file projects/ViTDet/configs/eva2_o365_to_coco/demo_global.py \
train.output_dir=./outputs/demo_global \
train.init_checkpoint=pretrained_models/HADM-G_0249999.pth \
dataloader.train.total_batch_size=1 \
train.model_ema.enabled=True \
train.model_ema.use_ema_weights_for_eval_only=True \
inference.input_dir=demo/images \
inference.output_dir=demo/outputs/result_globalResults will be saved under demo/outputs/result_global.
Evaluate HADM-L on all domains (SDXL, DALLE-2, DALLE-3, Midjourney).
python tools/lazyconfig_train_net.py --num-gpus 1 --eval-only \
--config-file projects/ViTDet/configs/eva2_o365_to_coco/eva02_large_local.py \
train.output_dir=./outputs/eva02_large_local/250k_on_all_val \
train.init_checkpoint=pretrained_models/HADM-L_0249999.pth \
dataloader.evaluator.output_dir=cache/large_local_human_artifact_ALL_val/250k_on_all_val \
dataloader.evaluator.dataset_name=local_human_artifact_val_ALL \
dataloader.test.dataset.names=local_human_artifact_val_ALL \
dataloader.train.total_batch_size=1 \
train.model_ema.enabled=True \
train.model_ema.use_ema_weights_for_eval_only=True Expected results:
Task: bbox
AP,AP50,AP75,APs,APm,APl
24.907,43.307,25.990,18.322,25.382,32.773Evaluate HADM-L on a specific domains (SDXL in this example).
python tools/lazyconfig_train_net.py --num-gpus 1 --eval-only \
--config-file projects/ViTDet/configs/eva2_o365_to_coco/eva02_large_local.py \
train.output_dir=./outputs/eva02_large_local/250k_on_sdxl_val \
train.init_checkpoint=pretrained_models/HADM-L_0249999.pth \
dataloader.evaluator.output_dir=cache/large_local_human_artifact_sdxl_val/250k_on_sdxl_val \
dataloader.evaluator.dataset_name=local_human_artifact_val_sdxl \
dataloader.test.dataset.names=local_human_artifact_val_sdxl \
dataloader.train.total_batch_size=1 \
train.model_ema.enabled=True \
train.model_ema.use_ema_weights_for_eval_only=True Expected results:
Task: bbox
AP,AP50,AP75,APs,APm,APl
21.141,39.529,21.372,17.813,22.557,26.149To evaluate on other domains, you may also replace dataloader.evaluator.dataset_name and dataloader.test.dataset.names to local_human_artifact_val_<DOMAIN> (e.g., val_sdxl, val_mj, val_dalle2, val_dalle3).
Evaluate HADM-G on all domains (SDXL, DALLE-2, DALLE-3, Midjourney).
python tools/lazyconfig_train_net.py --num-gpus 1 --eval-only \
--config-file projects/ViTDet/configs/eva2_o365_to_coco/eva02_large_global.py \
train.output_dir=./outputs/eva02_large_global/250k_on_all_val \
train.init_checkpoint=pretrained_models/HADM-G_0249999.pth \
dataloader.evaluator.output_dir=cache/large_global_human_artifact_ALL_val/250k_on_all_val \
dataloader.evaluator.dataset_name=global_human_artifact_val_ALL \
dataloader.test.dataset.names=global_human_artifact_val_ALL \
dataloader.train.total_batch_size=1 \
train.model_ema.enabled=True \
train.model_ema.use_ema_weights_for_eval_only=True Expected results:
Task: bbox
AP,AP50,AP75,APs,APm,APl
22.083,25.539,23.993,nan,0.000,22.332Evaluate HADM-G on a specific domains (SDXL in this example).
python tools/lazyconfig_train_net.py --num-gpus 1 --eval-only \
--config-file projects/ViTDet/configs/eva2_o365_to_coco/eva02_large_global.py \
train.output_dir=./outputs/eva02_large_global/250k_on_sdxl_val \
train.init_checkpoint=pretrained_models/HADM-G_0249999.pth \
dataloader.evaluator.output_dir=cache/large_global_human_artifact_sdxl_val/250k_on_sdxl_val \
dataloader.evaluator.dataset_name=global_human_artifact_val_sdxl \
dataloader.test.dataset.names=global_human_artifact_val_sdxl \
dataloader.train.total_batch_size=1 \
train.model_ema.enabled=True \
train.model_ema.use_ema_weights_for_eval_only=True Expected results:
Task: bbox
AP,AP50,AP75,APs,APm,APl
23.674,27.393,25.681,nan,0.000,23.891Similarly, to evaluate on other domains, you may also replace dataloader.evaluator.dataset_name and dataloader.test.dataset.names to global_human_artifact_val_<DOMAIN> (e.g., val_sdxl, val_mj, val_dalle2, val_dalle3).
To train Local Human Artifact Detection Model (HADM-L):
python tools/lazyconfig_train_net.py \
--config-file projects/ViTDet/configs/eva2_o365_to_coco/eva02_large_local.py \
--num-gpus=1 train.eval_period=10000 train.log_period=500 \
train.output_dir=./outputs/eva02_large_local \
dataloader.evaluator.output_dir=cache/large_local_human_artifact_ALL_val \
dataloader.train.total_batch_size=4To train Global Human Artifact Detection Model (HADM-G):
python tools/lazyconfig_train_net.py \
--config-file projects/ViTDet/configs/eva2_o365_to_coco/eva02_large_global.py \
--num-gpus=1 train.eval_period=10000 train.log_period=500 \
train.output_dir=./outputs/eva02_large_global \
dataloader.evaluator.output_dir=cache/large_global_human_artifact_ALL_val \
dataloader.train.total_batch_size=4If you find this work useful, please consider citing:
@article{Wang2024HADM,
title={Detecting Human Artifacts from Text-to-Image Models},
author={Wang, Kaihong and Zhang, Lingzhi and Zhang, Jianming},
journal={arXiv preprint arXiv:2411.13842},
year={2024}
}Our codebase is heavily borrowed from EVA-02-det and Detectron2.
