GitHub - moatifbutt/gen-color-bench: Benchmark for evaluating color understanding in text-to-image models | CVPR 2026

GenColorBench: A Color Evaluation Benchmark for Text-to-Image Generation

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2026
Muhammad Atif Butt, Alexandra Gomez-Villa, Tao Wu, Javier Vazquez-Corral, Joost Van De Weijer, and Kai Wang

GenColorBench is a comprehensive benchmark for evaluating color understanding capabilities in text-to-image (T2I) generation models. It systematically tests models across five tasks spanning explicit color naming, numerical color specifications, color-object associations, multi-object compositions, and implicit color understanding.

Key Features

5 Evaluation Tasks: CNA, NCU, COA, MOC, ICA covering diverse color understanding capabilities
2 Color Systems: ISCC-NBS (L1/L2/L3 granularities) and CSS3/X11 (147 web colors)
Automated Pipeline: GroundingDINO + SAM2 + OneHue + CIEDE2000 for objective evaluation
Mini Benchmark: ~10K stratified prompts for compute-efficient community evaluation
Full Benchmark: ~44K prompts for comprehensive analysis

Tasks Overview

Task	Abbrev.	Description	Example Prompt
Color Name Accuracy	CNA	Generate object in named color	"A vivid red car"
Numerical Color Understanding	NCU	Generate object from RGB/HEX spec	"A RGB(255,0,128) balloon"
Color-Object Association	COA	Color one object, leave another natural	"A blue apple next to a banana"
Multi-Object Composition	MOC	Assign distinct colors to multiple objects	"A red car and a green bicycle"
Implicit Color Association	ICA	Infer color from contextual reference	"A car the color of the sky"

Installation

# Clone repository
git clone https://github.com/moatifbutt/gen-color-bench.git
cd gencolorbench

# Create environment
conda create -n gencolorbench python=3.10
conda activate gencolorbench

# Install dependencies
pip install -r requirements.txt

# Install SAM2 and GroundingDINO (for evaluation)
pip install segment-anything-2
pip install groundingdino-py

Quick Start

1. Generate Benchmark Prompts

# Generate Mini benchmark (~10K prompts, recommended for initial evaluation)
python -m gencolorbench mini --output-dir ./mini_bench_prompt --seed 42

# Generate Full benchmark (~44K prompts)
python -m gencolorbench full --output-dir ./full_bench_prompt --seed 42

2. Generate Images

# Using FLUX.1-dev
python -m gencolorbench.generation \
    --model flux-dev \
    --prompts-dir ./mini_bench_prompt \
    --output-dir ./generated_images/flux-dev \
    --images-per-prompt 4 \
    --device cuda:0

3. Run Evaluation

cd gsam2/

python eval_pipeline.py \
    --prompts-dir ../mini_bench_prompt \
    --images-dir ../generated_images/flux-dev \
    --output-dir ../eval_results/ \
    --neg-csv ../gencolorbench/data/objects/obj_neg.csv \
    --colors-dir ../gencolorbench/data/neighborhoods \
    --color-tables-dir ../gencolorbench/data/color_systems \
    --sam2-checkpoint ./checkpoints/sam2.1_hiera_large.pt \
    --images-per-prompt 4 \
    --task all \
    --save-viz \
    --device cuda:0

4. Aggregate Results

python aggregate_results.py \
    --results-dir ../eval_results/ \
    --neg-csv ../gencolorbench/data/objects/obj_neg.csv \
    --output ../eval_results/summary.json \
    --csv ../eval_results/summary.csv

Evaluation Pipeline

Our automated evaluation pipeline:

Object Detection: GroundingDINO localizes target objects
Segmentation: SAM2 generates precise object masks
Color Extraction: OneHue + PCA extracts dominant color in LAB space
Color Matching: CIEDE2000 (ΔE) computes perceptual color distance
Accuracy: Match against GT color with neighborhood tolerance

Benchmark Structure

gencolorbench/
├── gencolorbench/
│   ├── data/
│   │   ├── color_systems/       # ISCC-NBS L1/L2/L3, CSS3/X11 color tables
│   │   ├── neighborhoods/       # Color neighborhood definitions
│   │   ├── objects/             # Object list with categories and negative labels
│   │   └── templates/           # Task 5 prompt templates
│   ├── evaluation/              # CNA, NCU, COA, MOC, ICA evaluators
│   ├── generation/              # T2I model wrappers
│   └── color/                   # Color space conversions, CIEDE2000
├── gsam2/                       # Evaluation entry point (run from here)
│   ├── eval_pipeline.py
│   ├── aggregate_results.py
│   └── checkpoints/             # SAM2 checkpoints
└── mini_bench_prompt/           # Generated prompt CSVs

Color Systems

System	Granularity	Colors	Description
ISCC-NBS L1	Coarse	13	Basic hue categories (Red, Orange, Yellow, ...)
ISCC-NBS L2	Medium	29	Intermediate distinctions
ISCC-NBS L3	Fine	267	Full ISCC-NBS specification with modifiers
CSS3/X11	Web	147	Standard web color names

Full results available in the paper and on the project page.

Citation

If you find GenColorBench useful in your research, please cite:

@article{butt2025gencolorbench,
  title={GenColorBench: A Color Evaluation Benchmark for Text-to-Image Generation Models},
  author={Butt, Muhammad Atif and Gomez-Villa, Alexandra and Wu, Tao and Vazquez-Corral, Javier and Van De Weijer, Joost and Wang, Kai},
  journal={arXiv preprint arXiv:2510.20586},
  year={2025}
}

Acknowledgments

This work was supported by the Computer Vision Center (CVC) at Universitat Autònoma de Barcelona. We thank the developers of SAM2, GroundingDINO, and the T2I models evaluated in this benchmark.

Questions? Open an issue or contact mabutt@cvc.uab.es

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
assets		assets
gencolorbench		gencolorbench
gsam2		gsam2
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GenColorBench: A Color Evaluation Benchmark for Text-to-Image Generation

Key Features

Tasks Overview

Installation

Quick Start

1. Generate Benchmark Prompts

2. Generate Images

3. Run Evaluation

4. Aggregate Results

Evaluation Pipeline

Benchmark Structure

Color Systems

Citation

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GenColorBench: A Color Evaluation Benchmark for Text-to-Image Generation

Key Features

Tasks Overview

Installation

Quick Start

1. Generate Benchmark Prompts

2. Generate Images

3. Run Evaluation

4. Aggregate Results

Evaluation Pipeline

Benchmark Structure

Color Systems

Citation

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages