AI-Generated-Image-Detector

A detector ensembled with Swin-Transformer and CLIP

This project implements an ensemble model for detecting AI-generated images, combining a fine-tuned Swin-Transformer and a CLIP-based feature classifier. The Swin-Transformer is fine-tuned for image classification, while CLIP extracts robust features that are classified using a custom neural network. The final prediction is an ensemble of both models' outputs.

Requirements

To run this project, install the following Python packages:

pip install torch torchvision timm
pip install git+https://github.com/openai/CLIP.git

Additional dependencies (automatically installed with the above):

numpy
scikit-learn
pillow
tqdm

Ensure you have a CUDA-enabled GPU for optimal performance, though the code supports CPU execution as well.

Setup

Dataset Preparation: Modify the dataset_path variable in the code to point to your dataset directory:

dataset_path = "./AIGC-Detection-Dataset"

The dataset should have the following structure:

AIGC-Detection-Dataset/
├── train/
│   ├── 0_real/
│   └── 1_fake/
└── val/
    ├── 0_real/
    └── 1_fake/

Pretrained Models: The code uses pretrained weights for Swin-Transformer (swinv2_small_window16_256) and CLIP (ViT-L/14@336px), which are downloaded automatically via timm and clip.

Reproduction

Fine-Tuning Swin-Transformer

The Swin-Transformer is fine-tuned on the dataset with specific layers unfrozen for training. Key configurations include:

Model: swinv2_small_window16_256
Batch Size: 64
Epochs: 100
Optimizer: AdamW (lr=1e-4, weight_decay=1e-4)
Scheduler: CosineAnnealingWarmRestarts
Loss: CrossEntropyLoss with label smoothing (0.05)

Steps

Run the fine-tuning script (provided in the code).
Models are saved per epoch in Fine_Tuned_Swin_Models/.
The final model is selected based on the largest validation loss among epochs 50-100 with validation accuracy > 0.99. This is saved as swin_model.pth.

Why Largest Validation Loss?

Validation accuracy reflects intra-domain performance, which may not generalize across domains. A model with slightly lower intra-domain accuracy (and higher loss) might generalize better in cross-domain scenarios.

CLIP Feature Classification

CLIP (ViT-L/14@336px) extracts image features, which are then classified using a custom neural network.

Data Augmentation

Input: Training images
Augmentations:
- Padding to 336x336
- Center crop to 336x336
- Horizontal flip
- TenCrop (four corners, center, and their flipped versions)
Output: Augmented images saved in train_augmented/

Feature Extraction

Features are extracted using CLIP’s encode_image method.
Features are scaled using StandardScaler (saved as trained_scaler.pkl).

Model Training

Architecture: ComplexClassifier (input_dim=768, hidden_dim=512, output_dim=1)
Optimizer: SGD (lr=0.9, momentum=0.99, weight_decay=1e-4)
Loss: BCEWithLogitsLoss
Epochs: 100
Selection: Model with the highest validation accuracy is saved as model.pth.

Why Highest Accuracy?

Since CLIP is not fine-tuned for this task, validation accuracy is assumed to correlate strongly with test set performance.

Model Ensembling

The final prediction combines outputs from both models:

CLIP Probability: Weight = 0.489
Swin Probability: Weight = 0.511
Threshold: Combined score > 0.5 indicates an AI-generated image.

See the testing section for implementation details.

Testing

Test Code Template

The provided test function evaluates the ensemble model:

def test(model, swin_model, test_dataset_path):
    # Load models, dataset, and compute predictions
    # Returns accuracy

Demo Implementation

data_loader: Custom dataset class to preprocess images and extract CLIP features.
ComplexClassifier: Loads model.pth.
Swin-Transformer: Loads swin_model.pth.

Usage:

test_dataset_path = "./AIGC-Detection-Dataset/val"
accuracy = test(model, swin_model, test_dataset_path)
print(f"Test Accuracy: {accuracy:.4f}")

Notes

Customize data_loader and model based on your dataset structure.
Ensure trained_scaler.pkl, model.pth, and swin_model.pth are in the working directory.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
AI-Generated-Image-Detector.ipynb		AI-Generated-Image-Detector.ipynb
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI-Generated-Image-Detector

Table of Contents

Requirements

Setup

Reproduction

Fine-Tuning Swin-Transformer

Steps

Why Largest Validation Loss?

CLIP Feature Classification

Data Augmentation

Feature Extraction

Model Training

Why Highest Accuracy?

Model Ensembling

Testing

Test Code Template

Demo Implementation

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AI-Generated-Image-Detector

Table of Contents

Requirements

Setup

Reproduction

Fine-Tuning Swin-Transformer

Steps

Why Largest Validation Loss?

CLIP Feature Classification

Data Augmentation

Feature Extraction

Model Training

Why Highest Accuracy?

Model Ensembling

Testing

Test Code Template

Demo Implementation

Notes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages