Skip to content
forked from junz-debug/ConV

[NeurIPS 2025 Spotlight] "Detecting Generated Images by Fitting Natural Image Distributions"

License

Notifications You must be signed in to change notification settings

tmlr-group/ConV

 
 

Repository files navigation

Detecting Generated Images by Fitting Natural Image Distributions

A Distribution-Based Approach for AI-Generated Image Detection

Python 3.8+ PyTorch License: MIT


A distribution-based framework for detecting AI-generated images by modeling natural image distributions in feature space

Abstract

This repository contains the official implementation of Consistency Verification (ConV), a novel framework for detecting generated images that exploits geometric differences between the data manifolds of natural and generated images. Unlike existing methods that rely heavily on training binary classifiers with numerous generated images, ConV depends solely on fitting the natural image distribution. The framework employs a pair of functions engineered to yield consistent outputs for natural images but divergent outputs for generated ones, leveraging the property that their gradients reside in mutually orthogonal subspaces. This design enables a training-free detection method: an image is identified as generated if a transformation along its data manifold induces a significant change in the loss value of a self-supervised model pre-trained on natural images. To address diminishing manifold disparities in advanced generative models, we leverage Normalizing Flow models to amplify detectable differences by extruding generated images away from the natural image manifold.

Motivation

The rapid advancement of generative AI technologies has led to increasingly sophisticated deepfakes and AI-generated images, posing significant challenges to image authenticity verification. Traditional detection methods often rely on model-specific artifacts, limiting their generalization capability when encountering novel generative models.

Framework Architecture

Architecture of the distribution-based detection framework

Current limitations in generated image detection include:

  • Heavy Dependence on Generated Image Distribution: Existing binary classification approaches require extensive collections of natural and generated images for training, making their performance heavily dependent on the diversity of generated data. This raises a fundamental question: can a detector trained on images from some generative models generalize to those from unknown models?

  • Generalization Challenges: It is challenging to determine whether a binary classifier trained over images generated by certain diffusion models can generalize effectively to those generated by other models, limiting practical applicability

  • Computational Intensity: Sustaining robust detection performance necessitates continual collection of images generated by the latest generative models, which can be costly or infeasible due to inaccessibility of potential models

  • Distribution Mismatch: The performance of binary classifiers relies on the diversity of generated data, yet it remains uncertain whether they can effectively distinguish images from unknown generative models

Our Approach

We propose Consistency Verification (ConV), a novel framework that detects generated images by exploiting geometric differences between the data manifolds of natural and generated images. Unlike existing methods that rely heavily on the distribution of generated images, our approach depends solely on fitting the natural image distribution, enabling robust generalization to unknown generative models.

  • Manifold Disparity Exploitation: The method leverages the fundamental observation that natural and generated images occupy distinct manifolds in the feature space. By visualizing feature representations extracted from DINOv2 (pre-trained solely on natural images), we observe that generated images exhibit distinct manifold structures compared to natural images, providing a geometric basis for detection.

  • Orthogonality Principle and Consistency Verification: We introduce a pair of functions designed to yield consistent outputs for natural images but divergent outputs for generated ones. The design is guided by an orthogonality principle: the gradients of these functions lie in mutually orthogonal subspaces. This enables a training-free detection approach: an image is identified as generated if a transformation along its data manifold induces a significant change in the loss value of a self-supervised model pre-trained on natural images.

  • Normalizing Flow for Manifold Extrusion: To address diminishing manifold disparities in advanced generative models, we employ Normalizing Flow models to explicitly extrude generated images away from the natural image manifold. By transforming the natural image manifold into a Gaussian distribution, the flow model enables precise separation and amplification of detectable differences between natural and generated images.

  • Natural Distribution Dependency: A key advantage of ConV is its reliance on fitting the natural data distribution rather than the distribution of generated images. This design principle addresses the generalization challenge faced by binary classifiers, whose performance typically depends on the diversity of generated data, making them vulnerable to unknown generative models.

Installation

# Clone the repository
git clone https://github.com/yourusername/ConV.git
cd ConV

# Create conda environment (optional)
conda create -n conv python=3.10
conda activate conv

# Install dependencies
pip install -r requirements.txt

Usage

1. Dataset Structure

Training Dataset (Two Organization Formats)

Format 1: Separate Mode (--path_mode separate)

training_data/
├── 0_real/           # Natural images
└── 1_fake/           # Generated images

Format 2: Subdirectory Mode (--path_mode subdirs)

training_data/
├── subset1/
│   ├── 0_real/    # Natural images
│   └── 1_fake/    # Generated images
├── subset2/
│   ├── 0_real/
│   └── 1_fake/
└── ...

Test Dataset (Only One Format)

Separate Mode (--path_mode separate)

test_data/
├── 0_real/           # Natural images
└── 1_fake/           # Generated images

2. Feature Extraction (for F-ConV Training)

F-ConV requires pre-extracted DINOv2 features for training. Use the provided script:

bash scripts/extract_features.sh

Configuration:

  1. Open scripts/extract_features.sh
  2. Set train_real_path and train_fake_path to your training data paths (for subdirs mode, you just need to set one path)
  3. Set train_path_mode to "separate" or "subdirs" based on your dataset structure
  4. Set train_output to specify output feature file path (e.g., features_train.pkl)
  5. Run the script

The script will automatically extract features according to your dataset structure.


3. Running ConV (Training-Free Detection)

Use the provided script for batch evaluation:

bash scripts/run_conv.sh

Configuration:

  1. Open scripts/run_conv.sh
  2. Modify the arrays:
    real_paths=("path/to/test/real1" "path/to/test/real2")
    fake_paths=("path/to/test/fake1" "path/to/test/fake2")
    dataset_names=("Dataset1" "Dataset2")
  3. Adjust parameters (batch size, crops_num, etc.)
  4. Run the script

Example output:

AUROC: 0.9523
AP: 0.9456
ACC: 0.9234

4. Running F-ConV (Training-Based Detection)

Use the provided script for batch evaluation with training:

bash scripts/run_fconv.sh

Configuration:

  1. Open scripts/run_fconv.sh
  2. Set training feature path:
    train_feature_path="features_train.pkl"  # From step 2
    # To load a pre-trained model for detection, leave train_feature_path empty: train_feature_path=""
  3. Set model save path:
    model_path="fconv_model.pth"
  4. Modify test dataset arrays:
    real_paths=("path/to/test/real1" "path/to/test/real2")
    fake_paths=("path/to/test/fake1" "path/to/test/fake2")
    dataset_names=("Dataset1" "Dataset2")
  5. Run the script

The script will:

  • Train the F-ConV model on extracted features
  • Test on all specified datasets
  • Save results to results/ directory

Methodology

Core Principle

The ConV framework exploits geometric differences between the data manifolds of natural and generated images. The core principle is based on the observation that natural and generated images occupy distinct manifolds in the feature space, as visualized through DINOv2 embeddings. Our method leverages this manifold disparity through:

  1. Manifold Disparity Observation: Feature representations extracted by DINOv2 (pre-trained solely on natural images) reveal that generated images exhibit distinct manifold structures compared to natural images

  2. Orthogonality Principle: We design a pair of functions whose gradients lie in mutually orthogonal subspaces, ensuring consistent outputs for natural images but divergent outputs for generated ones

  3. Training-Free Detection: An image is identified as generated if a transformation along its data manifold induces a significant change in the loss value of a self-supervised model pre-trained on natural images

  4. Manifold Extrusion with Normalizing Flow: To amplify detectable differences, Normalizing Flow models are employed to explicitly extrude generated images away from the natural image manifold by transforming the natural manifold into a Gaussian distribution

Technical Details

DINOv2 Feature Extraction

  • Pre-trained DINOv2 ViT-L/14 model is employed
  • Normalized feature vectors are extracted
  • Multi-crop view feature extraction is supported

Normalizing Flow Model

  • Reversible neural network based on flow architecture
  • Feature distribution is learned through coupling layers

Loss Functions

The F-ConV method employs the following loss functions:

  • Shape Loss: Distinguishes feature distributions between real and generated images
  • Consistency Loss: Ensures feature consistency across transformations
  • Total Loss: Combined loss for end-to-end training

Performance

The method demonstrates superior detection performance across multiple datasets, particularly in cross-model generalization scenarios.

Citation

If you find this work useful for your research, please cite:

@inproceedings{
  zhang2025detecting,
  title={Detecting Generated Images by Fitting Natural Image Distributions},
  author={Zhang, Yonggang and Nie, Jun and Tian, Xinmei and Gong, Mingming and Zhang, Kun and Han, Bo},
  booktitle={NeurIPS 2025},
  year={2025},
}
ConV/
├── srcs/                     # Source code directory
│   ├── ConV.py              # Basic ConV detection method (training-free)
│   ├── F_ConV.py            # F-ConV method (Flow-based with Normalizing Flow)
│   └── extract_features.py  # DINOv2 feature extraction script
├── scripts/                  # Shell scripts for batch processing
│   ├── run_conv.sh          # Batch evaluation script for ConV method
│   ├── run_fconv.sh         # Batch evaluation script for F-ConV method
│   └── extract_features.sh  # Feature extraction shell script
├── model.py                  # Normalizing Flow model definition
├── config.py                 # Model configuration (coupling blocks, hidden units, etc.)
├── freia_funcs.py           # FrEIA framework utility functions
├── utils.py                  # Utility functions and loss functions
├── augmentation.py           # Data augmentation for ConV method
├── augmentations_fconv.py    # Data augmentation for F-ConV method
├── my_transforms.py          # Custom image transformation functions
├── requirements.txt          # Python dependencies with CUDA support
├── framework.png             # Framework architecture diagram
├── LICENSE                   # MIT License
└── README.md                 # This documentation

License

This project is licensed under the MIT License. See the LICENSE file for details.

Contact

For questions, technical support, or collaboration inquiries, please contact nj18@mail.ustc.edu.cn.

About

[NeurIPS 2025 Spotlight] "Detecting Generated Images by Fitting Natural Image Distributions"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 83.0%
  • Shell 17.0%