[ICCV 2025] HUB: Holistic Unlearning Benchmark

Official Implementation of Holistic Unlearning Benchmark: A Multi-Faceted Evaluation for Text-to-Image Diffusion Model Unlearning (ICCV 2025)

A comprehensive benchmark for evaluating unlearning methods in text-to-image diffusion models across multiple tasks and metrics
💡 Feel free to explore the code, open issues, or reach out for discussions and collaborations!

📦 Environment setup

Installation

To set up the environment, follow these steps:

Clone the repository:

git clone https://github.com/ml-postech/HUB.git
cd HUB

Create and activate the conda environment:

conda create -n HUBG python=3.10
conda activate HUBG
conda install -y pytorch torchvision pytorch-cuda=11.8 -c pytorch -c nvidia
pip install -r requirements.txt --no-deps

Download pre-trained models and datasets

Reference image dataset
- To evaluate target proportion, reference images for each concept are required. We provide these reference images as part of a Hugging Face dataset.
- Once downloaded, place the dataset under the images/ directory.
- Alternative download link: images
Aesthetic Predictor
- For aesthetic score, we use the sac+logos+ava1-l14-linearMSE.pth model.
- Place it in the /models/aesthetic_predictor directory.
- Alternative download link: aesthetic_predictor
Q16
- Download prompts.p from this link and place it at /models/q16/ directory.
- Alternative download link: q16
GIPHY Celebrity Detector
- Download giphy_celeb_detector.zip from this link or alternatively from [this Link] () and extract it to /models/ directory.
- Alternative download link: celeb-detection-oss

🖼️ Image generation

To perform evaluation using HUB, you must first generate images for each concept and task with your unlearned model. Use the prompts described below to generate images.

python source/image_generation.py \
    --method YOUR_METHOD \
    --target TARGET \
    --task TASK

YOUR_METHOD can be one of the following already configured: sd, esd, uce, salun, ac, sa, receler, sld, mace. TARGET can be one of the following: Celebrities: Angelina Jolie, Ariana Grande, Brad Pitt, David Beckham, Elon Musk, Emma Watson, Lady Gaga, Leonardo DiCaprio, Taylor Swift, Tom Cruise; Styles: Andy Warhol, Auguste Renoir, Claude Monet, Édouard Manet, Frida Kahlo, Paul Cézanne, Picasso, Piet Mondrian, Van Gogh, Roy Lichtenstein; IP characters: Buzz Lightyear, Homer Simpson, Luigi, Mario, Mickey Mouse, Pikachu, Snoopy, Sonic, SpongeBob, Stitch; NSFW concepts: Nudity, Violent, Disturbing. TASK must be one of the following: target_image, general_image, selective_alignment, pinpoint_ness, multilingual_robustness, attack_robustness, incontext_ref_image.

Examples:

Running sd:

python source/image_generation.py --method sd --target "Nudity" --task pinpoint_ness --device cuda

Running unlearning based on UCE: First download the weights from NSFW.pt and place it in models/uce folder. Then run:

python source/image_generation.py --method uce --target "Nudity" --task pinpoint_ness --device cuda

💬 Prompt generation

All prompts used in our experiments are provided in the prompts/ directory. You can also generate prompts for your own target using the following scripts.

Target prompt generation (base prompts)

python source/prompt_generation/prompt.py \
  --target YOUR_TARGET \
  [--style] [--nsfw]

Use --style for style-related targets
Use --nsfw for NSFW-related targets

Multilingual robustness

After generating the base prompts, create multilingual versions:

python source/prompt_generation/translate_prompt.py \
  --target YOUR_TARGET

Pinpoint-ness

python source/prompt_generation/pinpoint_ness.py \
  --target YOUR_TARGET

Selective alignment

python source/prompt_generation/selective_alignment.py \
  --target YOUR_TARGET \
  [--style]   # Add only if this is a style-related target

📊 Evaluation

How to evaluate own model?

For now, we support the following seven unlearning methods: SLD, AC, ESD, UCE, SA, Receler, MACE. To evaluate your own model, you need to modify model.__init__.py to include the loading of your custom model. We recommend that you place your model in models/sd/YOUR_METHOD/.

Run the evaluation

To run the all tasks at once, execute the following command:

python main.py --method YOUR_METHOD --target TARGET

Example a smoke test: set NUM_TARGET_IMGS and NUM_GENERAL_IMGS in envs.py to low number e.g. 30

NUM_TARGET_IMGS = 30
NUM_GENERAL_IMGS = 30000

and try the following.

    python main.py --method sd --target "Nudity" --device cuda

🎯 How to evaluate each task individually?

Running evaluation using main.py takes a long time, as it evaluates all tasks at once. To evaluate each task separately, follow these commands. In the following examples, replace the variables according to the settings you want to evaluate. Make sure to execute below command before evaluating each task.

export PYTHONPATH=$PYTHONPATH:YOUR_PROJECT_DIR

Target proportion, multilingual robustness, and attack robustness

The evaluation code is configured to run separately for each concept type, because different classifiers are used. For the target_proportion, multilingual_robustness, and attack_robustness tasks, run the following code.

Celebrity

python source/eval/eval_gcd.py \
    --task TASK \
    --method YOUR_METHOD \
    --target TARGET

Style, IP (VLM)

python source/eval/eval_vlm.py \
    --task TASK \
    --method YOUR_METHOD \
    --target TARGET

NSFW

python source/eval/eval_nsfw.py \
    --task TASK \
    --method YOUR_METHOD \
    --target TARGET

Quality & Alignment

TASK: general_image, target_image.
METRIC: aesthetic, ImageReward, PickScore, FID, FID_SD.

python source/quality/evaluation.py \
    --method YOUR_METHOD \
    --target TARGET \
    --task TASK \
    --metric METRIC

Selective alignment

python source/eval/eval_selective_alignment.py \
    --method YOUR_METHOD \
    --target TARGET

Pinpoint-ness

python source/eval/eval_vlm.py \
    --task "pinpoint_ness" \
    --method YOUR_METHOD \
    --target TARGET

📌 To Do

Add two attacks.
Add a leaderboard for each task.
Add new unlearning methods.

📚 Citation

@article{moon2024holistic,
    title={Holistic Unlearning Benchmark: A Multi-Faceted Evaluation for Text-to-Image Diffusion Model Unlearning},
    author={Moon, Saemi and Lee, Minjong and Park, Sangdon and Kim, Dongwoo},
    journal={arXiv preprint arXiv:2410.05664},
    year={2024}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

[ICCV 2025] HUB: Holistic Unlearning Benchmark

📦 Environment setup

Installation

Download pre-trained models and datasets

🖼️ Image generation

💬 Prompt generation

Target prompt generation (base prompts)

Multilingual robustness

Pinpoint-ness

Selective alignment

📊 Evaluation

How to evaluate own model?

Run the evaluation

🎯 How to evaluate each task individually?

Target proportion, multilingual robustness, and attack robustness

Quality & Alignment

Selective alignment

Pinpoint-ness

📌 To Do

📚 Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
models		models
prompts		prompts
source		source
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
envs.py		envs.py
main.py		main.py
requirements.txt		requirements.txt

License

ml-postech/HUB

Folders and files

Latest commit

History

Repository files navigation

[ICCV 2025] HUB: Holistic Unlearning Benchmark

📦 Environment setup

Installation

Download pre-trained models and datasets

🖼️ Image generation

💬 Prompt generation

Target prompt generation (base prompts)

Multilingual robustness

Pinpoint-ness

Selective alignment

📊 Evaluation

How to evaluate own model?

Run the evaluation

🎯 How to evaluate each task individually?

Target proportion, multilingual robustness, and attack robustness

Quality & Alignment

Selective alignment

Pinpoint-ness

📌 To Do

📚 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages