Detecting Generated Images by Fitting Natural Image Distributions

A Distribution-Based Approach for AI-Generated Image Detection

A distribution-based framework for detecting AI-generated images by modeling natural image distributions in feature space

Abstract

This repository contains the official implementation of Consistency Verification (ConV), a novel framework for detecting generated images that exploits geometric differences between the data manifolds of natural and generated images. Unlike existing methods that rely heavily on training binary classifiers with numerous generated images, ConV depends solely on fitting the natural image distribution. The framework employs a pair of functions engineered to yield consistent outputs for natural images but divergent outputs for generated ones, leveraging the property that their gradients reside in mutually orthogonal subspaces. This design enables a training-free detection method: an image is identified as generated if a transformation along its data manifold induces a significant change in the loss value of a self-supervised model pre-trained on natural images. To address diminishing manifold disparities in advanced generative models, we leverage Normalizing Flow models to amplify detectable differences by extruding generated images away from the natural image manifold.

Motivation

The rapid advancement of generative AI technologies has led to increasingly sophisticated deepfakes and AI-generated images, posing significant challenges to image authenticity verification. Traditional detection methods often rely on model-specific artifacts, limiting their generalization capability when encountering novel generative models.

Architecture of the distribution-based detection framework

Current limitations in generated image detection include:

Heavy Dependence on Generated Image Distribution: Existing binary classification approaches require extensive collections of natural and generated images for training, making their performance heavily dependent on the diversity of generated data. This raises a fundamental question: can a detector trained on images from some generative models generalize to those from unknown models?
Generalization Challenges: It is challenging to determine whether a binary classifier trained over images generated by certain diffusion models can generalize effectively to those generated by other models, limiting practical applicability
Computational Intensity: Sustaining robust detection performance necessitates continual collection of images generated by the latest generative models, which can be costly or infeasible due to inaccessibility of potential models
Distribution Mismatch: The performance of binary classifiers relies on the diversity of generated data, yet it remains uncertain whether they can effectively distinguish images from unknown generative models

Our Approach

We propose Consistency Verification (ConV), a novel framework that detects generated images by exploiting geometric differences between the data manifolds of natural and generated images. Unlike existing methods that rely heavily on the distribution of generated images, our approach depends solely on fitting the natural image distribution, enabling robust generalization to unknown generative models.

Manifold Disparity Exploitation: The method leverages the fundamental observation that natural and generated images occupy distinct manifolds in the feature space. By visualizing feature representations extracted from DINOv2 (pre-trained solely on natural images), we observe that generated images exhibit distinct manifold structures compared to natural images, providing a geometric basis for detection.
Orthogonality Principle and Consistency Verification: We introduce a pair of functions designed to yield consistent outputs for natural images but divergent outputs for generated ones. The design is guided by an orthogonality principle: the gradients of these functions lie in mutually orthogonal subspaces. This enables a training-free detection approach: an image is identified as generated if a transformation along its data manifold induces a significant change in the loss value of a self-supervised model pre-trained on natural images.
Normalizing Flow for Manifold Extrusion: To address diminishing manifold disparities in advanced generative models, we employ Normalizing Flow models to explicitly extrude generated images away from the natural image manifold. By transforming the natural image manifold into a Gaussian distribution, the flow model enables precise separation and amplification of detectable differences between natural and generated images.
Natural Distribution Dependency: A key advantage of ConV is its reliance on fitting the natural data distribution rather than the distribution of generated images. This design principle addresses the generalization challenge faced by binary classifiers, whose performance typically depends on the diversity of generated data, making them vulnerable to unknown generative models.

Installation

# Clone the repository
git clone https://github.com/yourusername/ConV.git
cd ConV

# Create conda environment (optional)
conda create -n conv python=3.10
conda activate conv

# Install dependencies
pip install -r requirements.txt

Usage

1. Dataset Structure

Training Dataset (Two Organization Formats)

Format 1: Separate Mode (--path_mode separate)

training_data/
├── 0_real/           # Natural images
└── 1_fake/           # Generated images

Format 2: Subdirectory Mode (--path_mode subdirs)

training_data/
├── subset1/
│   ├── 0_real/    # Natural images
│   └── 1_fake/    # Generated images
├── subset2/
│   ├── 0_real/
│   └── 1_fake/
└── ...

Test Dataset (Only One Format)

Separate Mode (--path_mode separate)

test_data/
├── 0_real/           # Natural images
└── 1_fake/           # Generated images

2. Feature Extraction (for F-ConV Training)

F-ConV requires pre-extracted DINOv2 features for training. Use the provided script:

bash scripts/extract_features.sh

Configuration:

Open scripts/extract_features.sh
Set train_real_path and train_fake_path to your training data paths (for subdirs mode, you just need to set one path)
Set train_path_mode to "separate" or "subdirs" based on your dataset structure
Set train_output to specify output feature file path (e.g., features_train.pkl)
Run the script

The script will automatically extract features according to your dataset structure.

3. Running ConV (Training-Free Detection)

Use the provided script for batch evaluation:

bash scripts/run_conv.sh

Configuration:

Open scripts/run_conv.sh

Modify the arrays:

real_paths=("path/to/test/real1" "path/to/test/real2")
fake_paths=("path/to/test/fake1" "path/to/test/fake2")
dataset_names=("Dataset1" "Dataset2")

Adjust parameters (batch size, crops_num, etc.)
Run the script

Example output:

AUROC: 0.9523
AP: 0.9456
ACC: 0.9234

4. Running F-ConV (Training-Based Detection)

Use the provided script for batch evaluation with training:

bash scripts/run_fconv.sh

Configuration:

Open scripts/run_fconv.sh

Set training feature path:

train_feature_path="features_train.pkl"  # From step 2
# To load a pre-trained model for detection, leave train_feature_path empty: train_feature_path=""

Set model save path:
```
model_path="fconv_model.pth"
```

Modify test dataset arrays:

real_paths=("path/to/test/real1" "path/to/test/real2")
fake_paths=("path/to/test/fake1" "path/to/test/fake2")
dataset_names=("Dataset1" "Dataset2")

Run the script

The script will:

Train the F-ConV model on extracted features
Test on all specified datasets
Save results to results/ directory

Methodology

Core Principle

The ConV framework exploits geometric differences between the data manifolds of natural and generated images. The core principle is based on the observation that natural and generated images occupy distinct manifolds in the feature space, as visualized through DINOv2 embeddings. Our method leverages this manifold disparity through:

Manifold Disparity Observation: Feature representations extracted by DINOv2 (pre-trained solely on natural images) reveal that generated images exhibit distinct manifold structures compared to natural images
Orthogonality Principle: We design a pair of functions whose gradients lie in mutually orthogonal subspaces, ensuring consistent outputs for natural images but divergent outputs for generated ones
Training-Free Detection: An image is identified as generated if a transformation along its data manifold induces a significant change in the loss value of a self-supervised model pre-trained on natural images
Manifold Extrusion with Normalizing Flow: To amplify detectable differences, Normalizing Flow models are employed to explicitly extrude generated images away from the natural image manifold by transforming the natural manifold into a Gaussian distribution

Technical Details

DINOv2 Feature Extraction

Pre-trained DINOv2 ViT-L/14 model is employed
Normalized feature vectors are extracted
Multi-crop view feature extraction is supported

Normalizing Flow Model

Reversible neural network based on flow architecture
Feature distribution is learned through coupling layers

Loss Functions

The F-ConV method employs the following loss functions:

Shape Loss: Distinguishes feature distributions between real and generated images
Consistency Loss: Ensures feature consistency across transformations
Total Loss: Combined loss for end-to-end training

Performance

The method demonstrates superior detection performance across multiple datasets, particularly in cross-model generalization scenarios.

Citation

If you find this work useful for your research, please cite:

@inproceedings{
  zhang2025detecting,
  title={Detecting Generated Images by Fitting Natural Image Distributions},
  author={Zhang, Yonggang and Nie, Jun and Tian, Xinmei and Gong, Mingming and Zhang, Kun and Han, Bo},
  booktitle={NeurIPS 2025},
  year={2025},
}

ConV/
├── srcs/                     # Source code directory
│   ├── ConV.py              # Basic ConV detection method (training-free)
│   ├── F_ConV.py            # F-ConV method (Flow-based with Normalizing Flow)
│   └── extract_features.py  # DINOv2 feature extraction script
├── scripts/                  # Shell scripts for batch processing
│   ├── run_conv.sh          # Batch evaluation script for ConV method
│   ├── run_fconv.sh         # Batch evaluation script for F-ConV method
│   └── extract_features.sh  # Feature extraction shell script
├── model.py                  # Normalizing Flow model definition
├── config.py                 # Model configuration (coupling blocks, hidden units, etc.)
├── freia_funcs.py           # FrEIA framework utility functions
├── utils.py                  # Utility functions and loss functions
├── augmentation.py           # Data augmentation for ConV method
├── augmentations_fconv.py    # Data augmentation for F-ConV method
├── my_transforms.py          # Custom image transformation functions
├── requirements.txt          # Python dependencies with CUDA support
├── framework.png             # Framework architecture diagram
├── LICENSE                   # MIT License
└── README.md                 # This documentation

License

This project is licensed under the MIT License. See the LICENSE file for details.

Contact

For questions, technical support, or collaboration inquiries, please contact nj18@mail.ustc.edu.cn.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Detecting Generated Images by Fitting Natural Image Distributions

A Distribution-Based Approach for AI-Generated Image Detection

Abstract

Motivation

Our Approach

Installation

Usage

1. Dataset Structure

Training Dataset (Two Organization Formats)

Test Dataset (Only One Format)

2. Feature Extraction (for F-ConV Training)

3. Running ConV (Training-Free Detection)

4. Running F-ConV (Training-Based Detection)

Methodology

Core Principle

Technical Details

DINOv2 Feature Extraction

Normalizing Flow Model

Loss Functions

Performance

Citation

License

Contact

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
scripts		scripts
srcs		srcs
LICENSE		LICENSE
README.md		README.md
augmentation.py		augmentation.py
augmentations_fconv.py		augmentations_fconv.py
config.py		config.py
framework.png		framework.png
freia_funcs.py		freia_funcs.py
model.py		model.py
my_transforms.py		my_transforms.py
requirements.txt		requirements.txt
transforms.py		transforms.py
utils.py		utils.py

License

tmlr-group/ConV

Folders and files

Latest commit

History

Repository files navigation

Detecting Generated Images by Fitting Natural Image Distributions

A Distribution-Based Approach for AI-Generated Image Detection

Abstract

Motivation

Our Approach

Installation

Usage

1. Dataset Structure

Training Dataset (Two Organization Formats)

Test Dataset (Only One Format)

2. Feature Extraction (for F-ConV Training)

3. Running ConV (Training-Free Detection)

4. Running F-ConV (Training-Based Detection)

Methodology

Core Principle

Technical Details

DINOv2 Feature Extraction

Normalizing Flow Model

Loss Functions

Performance

Citation

License

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages