GitHub - AntigoneRandy/DREAM: Official implementation for "DREAM: Scalable Red Teaming for Text-to-Image Generative Systems via Distribution Modeling" (IEEE S&P 2026)

DREAM: Scalable Red Teaming for Text-to-Image Generative Systems via Distribution Modeling

This repository contains the official implementation for "DREAM: Scalable Red Teaming for Text-to-Image Generative Systems via Distribution Modeling" (IEEE S&P 2026).

Installation

conda create -n dream python=3.10 -y
conda activate dream
pip install -r requirements.txt && pip install transformers==4.45.2 sentence-transformers==3.2.1

Note: We use BLIP-2 under salesforce_lavis==1.0.2, which conflicts with transformers==4.45.2 and sentence-transformers==3.2.1. Therefore, you must install salesforce_lavis first (it is already included in requirements.txt), and then install the remaining two packages. Please strictly follow this installation order. Otherwise, unexpected issues may occur.

Repository Structure

DREAM
├── configs
│   ├── filters
│   └── t2i_models
├── experiments
├── files
│   ├── checkpoints
│   └── data
├── README.md
├── requirements.txt
├── run.sh
└── src
    ├── eval.py
    ├── generate.py
    ├── main.py
    ├── sample.py
    ├── trainer.py
    └── utils

configs/: YAML configuration files for experiment settings.
experiments/: Output output directory containing experiment artifacts.
files/: Static resources including model weights, few-shot examples, etc.
run.sh: Executes the full DREAM pipeline using configuration files.
src/: Core implementation: training, sampling, generation, evaluation, and utilities.

The DREAM-17k Unsafe Prompt Dataset

We release the final prompt files generated by the DREAM framework. The dataset, referred to as DREAM-17k , contains red-teaming prompts for evaluating safety defenses in text-to-image systems. DREAM-17k is produced by running DREAM across two unsafe concept categories (Sexual and Violence), five diffusion model variants (Stable Diffusion v1.5, CA, ESD, UCE, SafeGen, and RECE), and four safety filters (SC, NSFW-Image, NSFW-Text, and Keyword-Gibberish). Note that ESD, SafeGen, and RECE are only included for the Sexual category. Each configuration contains 1,024 prompts, resulting in a dataset designed for broad coverage.

This dataset can be used for fast screening of the safety performance of new text-to-image systems (similar to transfer-based attacks) as well as for safety fine-tuning to improve existing systems. However, because these prompts are statically generated against the specific models used in our experiments, a model that performs well on DREAM-17k is not necessarily safe in general. For evaluating safety under dynamically changing adversarial behaviors, please refer to the section Training DREAM on Your Own T2I System below, which describes how to train DREAM on your own model and obtain prompts tailored to your particular text-to-image system. We are also preparing a more refined version of the dataset and the trained red team LLMs. Stay tuned!

Please note that some prompts in this release were regenerated in post-paper runs. Because of potentially different hyperparameter (e.g., random seeds), randomness during training/sampling, and the inherent stochasticity of text-to-image generation, the PSR and PS metrics may differ slightly from those reported in the paper. Based on our local validation, these values are, in most cases, better than the originally reported results.

Quick Validation

After downloading the prompt dataset, you can immediately generate images and run the evaluation suite:

python src/generate.py \
    --prompt_file_path "${prompt_file_path}" \
    --image_dir "${image_dir}" \
    --unet_weight "${unet_weight}" \
    --t2i_model_type "${t2i_model_type}" \
    --filter_type "${filter_type}" \
    --num_inference_steps 50

--prompt_file_path: Path to the prompt CSV file, which must contain a "prompt" column
--image_dir: Output directory path for generated images
--unet_weight: Custom UNet weight file path for safety-aligned models, use null if not applicable (example: files/checkpoints/esd/diffusers-nudity-ESDu1-UNET.pt) - Note: You need to prepare the UNet weights yourself
--t2i_model_type: Text-to-image model type (SD1.5, safegen, etc.), default: SD1.5
--filter_type: Safety filter type (sc, image, text, keyword-gibberish), use null if not needed
--num_inference_steps: Number of inference steps, default: 50

python src/eval.py \
    --image_dir "${image_dir}" \
    --results_dir "${results_dir}" \
    --category "${category}" \
    --prompt_file_path "${prompt_file_path}"

--image_dir: Directory path of images to be evaluated (contains images generated by generate.py)
--results_dir: Output directory path for evaluation results
--category: Evaluation category (sexual, violence)
--prompt_file_path: Path to the prompt CSV file

You can sample prompts from the trained red team LLM via:

python src/sample.py \
    --category "${category}" \
    --alpha "${alpha}" \
    --model_path "${model_path}" \
    --sample_batch_size 32 \
    --sample_num_batches 32 \
    --output_dir "${output_dir}"

--category: Red teaming LLM training category (sexual, violence)
--alpha: Temperature scaling parameter for prompt sampling ($\alpha$ parameter from Equation (10) in Section 4.4 of the paper) default: 0.3
--model_path: Trained red teaming LLM checkpoint file path (e.g., experiments/checkpoints/best_model)
--sample_batch_size: Number of samples per batch, default: 32
--sample_num_batches: Number of sampling batches, default: 32 (total prompts = batch_size × num_batches)
--output_dir: Sampling output directory path (sampled prompts will be saved as CSV files to this directory)

End-to-End Pipeline (Training+Sampling+Testing)

To reproduce the results in our paper, use run.sh to launch the entire pipeline:

bash run.sh path/to/config.yaml

The script reads the supplied config file, runs training, prompt sampling, image generation, and evaluation, and stores all artifacts (checkpoints, prompt CSVs, generated images, metric logs) in the configured experiment.output_dir.

Configuration files are located under configs/, and each YAML file corresponds to a specific experiment setting. Please read src/utils/config_manager.py for a complete list of arguments. Below we introduce some important arguments:

experiment.*: Basic experiment metadata. Including experiment name, category (sexual or violence) and output directory. A timestamp suffix is automatically appended to experiment.output_dir to ensure each run has a unique output directory.
model.*: Text-to-image model configuration. For safety-aligned models such as esd, ca, uce, and rece, ensure that model.unet_weight is set to the correct checkpoint path before launching a run.
filter.*: External safety filter configuration. Specify the filter_type when filters are enabled. Supported types include:
- sc: Safety Checker
- image: NSFW Image Detector
- text: NSFW Text Detector
- keyword_gibberish: Keyword-Gibberish Filter
training.*: Hyperparameter settings for training DREAM. Notable arguments include max_steps (total number of training steps), save_steps (number of steps between checkpoint saves), num_train_epochs (total number of training epochs), temperature (minimum sampling temperature during prompt generation), and zo_eps (zeroth-order perturbation magnitude).

All experiments in the paper were conducted on two NVIDIA A100 80 GB GPUs.

Training DREAM on Your Own T2I System

DREAM directly supports Stable Diffusion v1.4 and v1.5 models via the DiffusionPipeline, and it can load arbitrary custom UNet weights by specifying the --unet_weight argument. It also supports the four safety filters described in the paper through the --filter_type option. If you need to integrate additional generative model architectures (beyond SD v1.4/1.5) or implement custom safety filters, please follow the instructions below.

Model loader: edit src/utils/t2i_model_loader.py to create a new loading routine (e.g., for a custom architecture or scheduler).
Safety Filter: follow the detection flow in src/utils/nsfw_filter.py to add new external safety filters.
Configurations: Create a new YAML file for your custom experiment, model, and filter settings. For training arguments, we recommend using the defaults in src/utils/config_manager.py.

Launch with run.sh once configured.

Acknowledgements

This project builds upon open-source work from:

We sincerely thank the respective authors for releasing their codebases.

Citation

@inproceedings{li2026dream,
  title={DREAM: Scalable Red Teaming for Text-to-Image Generative Systems via Distribution Modeling},
  author={Li, Boheng and Wang, Junjie and Li, Yiming and Hu, Zhiyang and Qi, Leyi and Dong, Jianshuo and Wang, Run and Qiu, Han and Qin, Zhan and Zhang, Tianwei},
  booktitle={2026 IEEE Symposium on Security and Privacy (SP)},
  year={2026},
  organization={IEEE}
}

License

This project is released under the Apache License 2.0.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DREAM: Scalable Red Teaming for Text-to-Image Generative Systems via Distribution Modeling

Installation

Repository Structure

The DREAM-17k Unsafe Prompt Dataset

Quick Validation

End-to-End Pipeline (Training+Sampling+Testing)

Training DREAM on Your Own T2I System

Acknowledgements

Citation

License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
configs		configs
files		files
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
run.sh		run.sh

AntigoneRandy/DREAM

Folders and files

Latest commit

History

Repository files navigation

DREAM: Scalable Red Teaming for Text-to-Image Generative Systems via Distribution Modeling

Installation

Repository Structure

The DREAM-17k Unsafe Prompt Dataset

Quick Validation

End-to-End Pipeline (Training+Sampling+Testing)

Training DREAM on Your Own T2I System

Acknowledgements

Citation

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages