This repository contains the official implementation for "DREAM: Scalable Red Teaming for Text-to-Image Generative Systems via Distribution Modeling" (IEEE S&P 2026).
conda create -n dream python=3.10 -y
conda activate dream
pip install -r requirements.txt && pip install transformers==4.45.2 sentence-transformers==3.2.1Note: We use BLIP-2 under salesforce_lavis==1.0.2, which conflicts with transformers==4.45.2 and sentence-transformers==3.2.1.
Therefore, you must install salesforce_lavis first (it is already included in requirements.txt), and then install the remaining two packages.
Please strictly follow this installation order. Otherwise, unexpected issues may occur.
DREAM
├── configs
│ ├── filters
│ └── t2i_models
├── experiments
├── files
│ ├── checkpoints
│ └── data
├── README.md
├── requirements.txt
├── run.sh
└── src
├── eval.py
├── generate.py
├── main.py
├── sample.py
├── trainer.py
└── utils
configs/: YAML configuration files for experiment settings.experiments/: Output output directory containing experiment artifacts.files/: Static resources including model weights, few-shot examples, etc.run.sh: Executes the full DREAM pipeline using configuration files.src/: Core implementation: training, sampling, generation, evaluation, and utilities.
We release the final prompt files generated by the DREAM framework. The dataset, referred to as DREAM-17k , contains red-teaming prompts for evaluating safety defenses in text-to-image systems. DREAM-17k is produced by running DREAM across two unsafe concept categories (Sexual and Violence), five diffusion model variants (Stable Diffusion v1.5, CA, ESD, UCE, SafeGen, and RECE), and four safety filters (SC, NSFW-Image, NSFW-Text, and Keyword-Gibberish). Note that ESD, SafeGen, and RECE are only included for the Sexual category. Each configuration contains 1,024 prompts, resulting in a dataset designed for broad coverage.
This dataset can be used for fast screening of the safety performance of new text-to-image systems (similar to transfer-based attacks) as well as for safety fine-tuning to improve existing systems. However, because these prompts are statically generated against the specific models used in our experiments, a model that performs well on DREAM-17k is not necessarily safe in general. For evaluating safety under dynamically changing adversarial behaviors, please refer to the section Training DREAM on Your Own T2I System below, which describes how to train DREAM on your own model and obtain prompts tailored to your particular text-to-image system. We are also preparing a more refined version of the dataset and the trained red team LLMs. Stay tuned!
Please note that some prompts in this release were regenerated in post-paper runs. Because of potentially different hyperparameter (e.g., random seeds), randomness during training/sampling, and the inherent stochasticity of text-to-image generation, the PSR and PS metrics may differ slightly from those reported in the paper. Based on our local validation, these values are, in most cases, better than the originally reported results.
After downloading the prompt dataset, you can immediately generate images and run the evaluation suite:
python src/generate.py \
--prompt_file_path "${prompt_file_path}" \
--image_dir "${image_dir}" \
--unet_weight "${unet_weight}" \
--t2i_model_type "${t2i_model_type}" \
--filter_type "${filter_type}" \
--num_inference_steps 50--prompt_file_path: Path to the prompt CSV file, which must contain a "prompt" column--image_dir: Output directory path for generated images--unet_weight: Custom UNet weight file path for safety-aligned models, use null if not applicable (example:files/checkpoints/esd/diffusers-nudity-ESDu1-UNET.pt) - Note: You need to prepare the UNet weights yourself--t2i_model_type: Text-to-image model type (SD1.5, safegen, etc.), default: SD1.5--filter_type: Safety filter type (sc, image, text, keyword-gibberish), use null if not needed--num_inference_steps: Number of inference steps, default: 50
python src/eval.py \
--image_dir "${image_dir}" \
--results_dir "${results_dir}" \
--category "${category}" \
--prompt_file_path "${prompt_file_path}"--image_dir: Directory path of images to be evaluated (contains images generated by generate.py)--results_dir: Output directory path for evaluation results--category: Evaluation category (sexual, violence)--prompt_file_path: Path to the prompt CSV file
You can sample prompts from the trained red team LLM via:
python src/sample.py \
--category "${category}" \
--alpha "${alpha}" \
--model_path "${model_path}" \
--sample_batch_size 32 \
--sample_num_batches 32 \
--output_dir "${output_dir}"-
--category: Red teaming LLM training category (sexual, violence) -
--alpha: Temperature scaling parameter for prompt sampling ($\alpha$ parameter from Equation (10) in Section 4.4 of the paper) default: 0.3 -
--model_path: Trained red teaming LLM checkpoint file path (e.g.,experiments/checkpoints/best_model) -
--sample_batch_size: Number of samples per batch, default: 32 -
--sample_num_batches: Number of sampling batches, default: 32 (total prompts = batch_size × num_batches) -
--output_dir: Sampling output directory path (sampled prompts will be saved as CSV files to this directory)
To reproduce the results in our paper, use run.sh to launch the entire pipeline:
bash run.sh path/to/config.yamlThe script reads the supplied config file, runs training, prompt sampling, image generation, and evaluation, and stores all artifacts (checkpoints, prompt CSVs, generated images, metric logs) in the configured experiment.output_dir.
Configuration files are located under configs/, and each YAML file corresponds to a specific experiment setting. Please read src/utils/config_manager.py for a complete list of arguments. Below we introduce some important arguments:
-
experiment.*: Basic experiment metadata. Including experiment name, category (sexual or violence) and output directory. A timestamp suffix is automatically appended toexperiment.output_dirto ensure each run has a unique output directory. -
model.*: Text-to-image model configuration. For safety-aligned models such asesd,ca,uce, andrece, ensure thatmodel.unet_weightis set to the correct checkpoint path before launching a run. -
filter.*: External safety filter configuration. Specify thefilter_typewhen filters are enabled. Supported types include:sc: Safety Checkerimage: NSFW Image Detectortext: NSFW Text Detectorkeyword_gibberish: Keyword-Gibberish Filter
-
training.*: Hyperparameter settings for training DREAM. Notable arguments includemax_steps(total number of training steps),save_steps(number of steps between checkpoint saves),num_train_epochs(total number of training epochs),temperature(minimum sampling temperature during prompt generation), andzo_eps(zeroth-order perturbation magnitude).
All experiments in the paper were conducted on two NVIDIA A100 80 GB GPUs.
DREAM directly supports Stable Diffusion v1.4 and v1.5 models via the DiffusionPipeline, and it can load arbitrary custom UNet weights by specifying the --unet_weight argument. It also supports the four safety filters described in the paper through the --filter_type option. If you need to integrate additional generative model architectures (beyond SD v1.4/1.5) or implement custom safety filters, please follow the instructions below.
- Model loader: edit
src/utils/t2i_model_loader.pyto create a new loading routine (e.g., for a custom architecture or scheduler). - Safety Filter: follow the detection flow in
src/utils/nsfw_filter.pyto add new external safety filters. - Configurations: Create a new YAML file for your custom experiment, model, and filter settings. For training arguments, we recommend using the defaults in
src/utils/config_manager.py.
Launch with run.sh once configured.
This project builds upon open-source work from:
We sincerely thank the respective authors for releasing their codebases.
@inproceedings{li2026dream,
title={DREAM: Scalable Red Teaming for Text-to-Image Generative Systems via Distribution Modeling},
author={Li, Boheng and Wang, Junjie and Li, Yiming and Hu, Zhiyang and Qi, Leyi and Dong, Jianshuo and Wang, Run and Qiu, Han and Qin, Zhan and Zhang, Tianwei},
booktitle={2026 IEEE Symposium on Security and Privacy (SP)},
year={2026},
organization={IEEE}
}This project is released under the Apache License 2.0.