[ICCV 2025] Know "No" Better: A Data-Driven Approach for Enhancing Negation Awareness in CLIP

TL;DR. To address the negation issue in CLIP, we propose data generation pipelines and validate their effectiveness using public and novel benchmarks. Our approach enhances negation awareness and extends to diverse multimodal tasks.

Status

✅ Preprint available on arXiv
✅ NegationCLIP checkpoints available on Hugging Face
✅ Data generation & fine-tuning scripts included
✅ NegRefCOCOg Benchmarks

⚙️ Installation

git clone https://github.com/jerryray/NegationCLIP.git
cd NegationCLIP
conda env create -f environment.yml
conda activate negationclip

🧠 Data Generation (Negation-inclusive Captions)

The script generates captions with explicit negation from COCO using LLaMA-3 and LLaVA-v1.6-Mistral-7B.

python src/data_generation.py \
  --caption_path /path/to/COCO/captions_train2014.json \
  --image_dir /path/to/COCO/train2014 \
  --output_dir ./output

Options

--use_random_object: randomly select absent objects (instead of contextual ones)

🏋️ Fine-Tuning (NegationCLIP)

Fine-tune the text encoder of CLIP on negation-inclusive captions:

python src/clip_finetune.py \
  --json_path ./annotations/negationclip_captions_train2014.json \
  --image_dir /path/to/train2014 \
  --output_dir ./checkpoints \
  --clip_model "ViT-B/32" \

Outputs

Best model automatically saved when validation loss improves.

🔍 Evaluation (NegRefCOCOg)

Evaluate NegationCLIP on the NegRefCOCOg benchmark:

cd NegRefCOCOg
python negrefcocog_eval.py \
  --arch "ViT-B/16" \
  --load_dir /path/to/checkpoint.pt \
  --device "cuda:1" \
  --annotation_file "NegRefCOCOg.json" \
  --image_dir /path/to/coco_images/train2014

🧱 Directory Structure

negationclip/
├── src/
│   ├── clip_finetune.py
│   └── data_generation.py
├── annotations/
│   └── negationclip_captions_train2014.json
├── NegRefCOCOg/
│   ├── negrefcocog_eval.py
│   ├── refer.py
│   ├── NegRefCOCOg.json
│   └── external/
├── requirements.txt
├── environment.yml
├── README.md
└── LICENSE

📦 Model Access

Hugging Face: jerryray/negationclip
Model Type: Fine-tuned CLIP (ViT-B/32, ViT-B/16, ViT-L/14, ViT-L/14@336px)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

[ICCV 2025] Know "No" Better: A Data-Driven Approach for Enhancing Negation Awareness in CLIP

Status

⚙️ Installation

🧠 Data Generation (Negation-inclusive Captions)

🏋️ Fine-Tuning (NegationCLIP)

🔍 Evaluation (NegRefCOCOg)

🧱 Directory Structure

📦 Model Access

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
NegRefCOCOg		NegRefCOCOg
annotations		annotations
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
requirements.txt		requirements.txt

License

parkquasar/NegationCLIP

Folders and files

Latest commit

History

Repository files navigation

[ICCV 2025] Know "No" Better: A Data-Driven Approach for Enhancing Negation Awareness in CLIP

Status

⚙️ Installation

🧠 Data Generation (Negation-inclusive Captions)

🏋️ Fine-Tuning (NegationCLIP)

🔍 Evaluation (NegRefCOCOg)

🧱 Directory Structure

📦 Model Access

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages