Forensic Media Detection (FMD) Tool - Technical Documentation

1. Overview

The Forensic Media Detection (FMD) Tool is a comprehensive AI & ML-driven system designed to detect deepfake artifacts across multiple media modalities. The tool employs state-of-the-art deep learning techniques to analyze images, videos, and audio files for signs of manipulation or synthetic generation.

2. System Architecture

The FMD tool follows a modular architecture with the following key components:

fmd_tool/
├── image_forensics/
│   └── image_detector.py
├── video_forensics/
│   └── video_detector.py
├── audio_forensics/
│   └── audio_detector.py
├── multimodal_analysis/
│   └── multimodal_detector.py
├── cli/
│   └── fmd_cli.py
├── tests/
│   ├── test_image_detector.py
│   ├── test_video_detector.py
│   ├── test_audio_detector.py
│   └── test_multimodal_detector.py
├── data/
│   └── (test files and datasets)
└── documentation/
    ├── user_manual.md
    └── technical_documentation.md

3. Image Forensics Module

3.1. Architecture

The Image Forensics module implements three primary detection approaches:

Vision Transformer (ViT) Classifier: A PyTorch-based ViT model for state-of-the-art generalization.
XceptionNet-based Classifier: A TensorFlow-based Xception model fine-tuned for deepfake detection.
Autoencoder-based Anomaly Detection: An autoencoder that learns to reconstruct authentic images and flags anomalies.

3.2. Key Features

Pixel-level Inconsistency Detection: Uses gradient analysis to identify anomalous pixel patterns
Lighting Consistency Analysis: Analyzes lighting distribution across image regions
Preprocessing Pipeline: Standardized image preprocessing for consistent model input

3.3. Model Details

XceptionNet Configuration:

Input Shape: (256, 256, 3)
Base Model: Xception (pre-trained on ImageNet)
Fine-tuning: Last 20 layers unfrozen
Output: Single sigmoid neuron for binary classification

Autoencoder Configuration:

Encoder: 4 Conv2D layers with max pooling
Decoder: 4 Conv2DTranspose layers with upsampling
Loss Function: Mean Squared Error (MSE)

4. Video Forensics Module

4.1. Architecture

The Video Forensics module employs a CNN-LSTM hybrid architecture for temporal anomaly detection:

Frame Extraction: Systematic sampling of video frames
CNN Feature Extraction: Spatial feature extraction from individual frames
LSTM Temporal Modeling: Analysis of temporal patterns across frames

4.2. Key Features

Temporal Anomaly Detection: Identifies inconsistencies in frame sequences
Lip-Sync Analysis: Detects audio-visual synchronization mismatches
Frame-by-Frame Analysis: Detailed examination of individual frames

4.3. Model Details

CNN-LSTM Configuration:

Input Shape: (sequence_length, 256, 256, 3)
CNN Layers: 3 Conv2D layers (64, 128, 256 filters)
LSTM Layers: 2 LSTM layers (128, 64 units)
Output: Single sigmoid neuron for binary classification

5. Audio Forensics Module

5.1. Architecture

The Audio Forensics module implements x-vector inspired architecture for speaker verification and synthetic speech detection:

Feature Extraction: MFCC (Mel-Frequency Cepstral Coefficients) extraction
Frame-level Processing: Conv1D layers for local feature extraction
Statistics Pooling: Mean and standard deviation pooling across time
Segment-level Processing: Dense layers for final classification

5.2. Key Features

Spectral Anomaly Detection: Analysis of frequency domain characteristics
Voice Synthesis Detection: Identification of synthetic speech artifacts
Speaker Verification: Comparison against known speaker profiles

5.3. Model Details

X-Vector Configuration:

Input Shape: (None, 20) - Variable length MFCC sequences
Frame-level: 3 Conv1D layers (512 filters each)
Statistics Pooling: Mean and standard deviation concatenation
Segment-level: 2 Dense layers (512 units each)
Output: Single sigmoid neuron for binary classification

6. Multimodal Analysis Module

6.1. Architecture

The Multimodal Analysis module implements late fusion for combining predictions from individual modality detectors:

Individual Modality Analysis: Separate analysis using specialized detectors
Cross-Modal Consistency Check: Analysis of prediction consistency across modalities
Ensemble Prediction: Weighted combination of individual predictions

6.2. Key Features

Late Fusion: Combines high-level predictions from individual modalities
Consistency Analysis: Identifies outlier predictions that may indicate manipulation
Confidence Scoring: Provides confidence estimates based on cross-modal agreement

6.3. Fusion Model Details

Late Fusion Configuration:

Inputs: 3 prediction scores (image, video, audio)
Hidden Layers: 2 Dense layers (64, 32 units) with dropout
Output: Single sigmoid neuron for final prediction

7. Command Line Interface

7.1. Design

The CLI is built using the Click library and provides a user-friendly interface for accessing all FMD functionalities:

Modular Commands: Separate commands for each analysis type
Flexible Options: Configurable models and output formats
Error Handling: Comprehensive error messages and validation

7.2. Implementation Details

Click Framework: Provides command parsing and help generation
JSON Output: Structured output format for programmatic access
Progress Indicators: User feedback during analysis

8. Testing Framework

8.1. Unit Tests

Each module includes comprehensive unit tests covering:

Model initialization and building
Feature extraction and preprocessing
Prediction functionality
Error handling

8.2. Integration Tests

End-to-end testing of the complete pipeline:

CLI command execution
File I/O operations
Cross-module interactions

9. Performance Considerations

9.1. Optimization Strategies

Model Quantization: Reduced precision for faster inference
Batch Processing: Efficient handling of multiple files
Memory Management: Careful handling of large media files

9.2. Scalability

Modular Design: Easy addition of new detection methods
Configurable Models: Support for different model architectures
Extensible Framework: Plugin architecture for custom detectors

10. Supported Pretrained Models

You can use the following pretrained models with the FMD tool for each modality:

Image Forensics

Vision Transformer (ViT) (recommended for generalization):
- Based on the latest research, ViT models offer superior cross-dataset generalization.
- Uses a pre-trained ViT-B/16 backbone (PyTorch).
XceptionNet:
- Pretrained on ImageNet, widely used for deepfake detection (e.g., FaceForensics++, DeepFakeDetection).
- Download from: Keras Applications or DeepFakeDetection Challenge
Autoencoder:
- You can use any convolutional autoencoder trained on authentic images from your domain.

Video Forensics

CNN-LSTM:
- Use a CNN (e.g., Xception, ResNet50) for frame feature extraction, pretrained on ImageNet.
- LSTM weights can be trained on video deepfake datasets (e.g., FaceForensics++, DFDC).
- Example: DeepFakeDetection or Kaggle DFDC

Audio Forensics

X-Vector:
- Pretrained x-vector models for speaker verification (e.g., Kaldi x-vector, SpeechBrain).
- Download from: SpeechBrain Pretrained Models or Kaldi VoxCeleb
CNN-LSTM:
- You can use a CNN-LSTM trained for audio deepfake detection (e.g., ASVspoof challenge models).

Multimodal Fusion

Late Fusion:
- The fusion model can be trained using outputs from the above pretrained models on a multimodal dataset.

11. Dependencies

10.1. Core Libraries

TensorFlow: Deep learning framework for model implementation (used for Xception and Autoencoder)
PyTorch: Deep learning framework for model implementation (used for Vision Transformer)
OpenCV: Computer vision operations for image and video processing
Librosa: Audio processing and feature extraction
Scikit-learn: Machine learning utilities and preprocessing
NumPy/Pandas: Numerical computing and data manipulation

10.2. CLI Libraries

Click: Command-line interface creation
Pillow: Image processing utilities

11. Future Enhancements

11.1. Planned Features

Early Fusion: Feature-level fusion for improved accuracy
Real-time Processing: Streaming analysis capabilities
Model Training Pipeline: Automated training on new datasets
Web Interface: Browser-based user interface

11.2. Research Directions

Adversarial Robustness: Defense against adversarial attacks
Explainable AI: Interpretable detection results
Cross-dataset Generalization: Improved performance across different datasets (partially addressed by ViT model)

Forensic Media Detection (FMD) Tool - User Manual

1. Introduction

The Forensic Media Detection (FMD) Tool is an AI & ML-driven command-line interface (CLI) application designed to detect deepfake artifacts in various media types, including images, videos, and audio files. This tool leverages advanced machine learning models to analyze media for inconsistencies and anomalies that may indicate manipulation.

2. Installation and Training (Lightweight Focus)

This repository has been updated to focus on the lightweight Image Forensics (XceptionNet) model. The full set of dependencies has been simplified to ensure a cleaner, more focused development environment.

2.1. Installation

To use the Image Forensics module, you need to have Python 3.8+ installed on your system. It is recommended to use a virtual environment to manage dependencies.

# Clone the repository
git clone https://github.com/RajaMuhammadAwais/Forensic-Media-Detection
cd Forensic-Media-Detection

# Install core dependencies for XceptionNet training
pip install -r requirements_light.txt

2.2. Dataset Download and Preparation

The train_xception.py script is configured to use a dataset with a specific folder structure. We recommend using the Deepfake and Real Images dataset from Kaggle.

Install Kaggle API:
```
pip install kaggle
```
Set up Kaggle Credentials:
- Go to your Kaggle account settings and create a new API token.
- Download the kaggle.json file and place it in the ~/.kaggle/ directory.

Download the Dataset:

kaggle datasets download -d manjilkarki/deepfake-and-real-images

Unzip and Structure the Dataset:
- Unzip the downloaded file.
- The train_xception.py script expects the data to be structured as follows inside the data/Dataset/ directory:
```
data/Dataset/
├── Train/
│   ├── Real/
│   └── Fake/
└── Validation/
    ├── Real/
    └── Fake/
```
- You will need to manually create the data/Dataset directory and move the unzipped image folders into the correct Train and Validation subdirectories.

2.3. Model Training

Once the dataset is prepared, you can run the training script. The script is configured for a small, proof-of-concept run.

python train_xception.py

Note: The trained model weights (xception_deepfake_weights.h5) are excluded from the repository via .gitignore due to file size limits. You will need to train the model locally to use the detection features.

To use the FMD Tool, you need to have Python 3.8+ installed on your system. It is recommended to use a virtual environment to manage dependencies.

# Clone the repository (if applicable)
# git clone <repository_url>
# cd fmd_tool

# Create a virtual environment
python3.11 -m venv venv
source venv/bin/activate  # On Windows, use `venv\Scripts\activate`

# Install dependencies
pip install -r requirements_light.txt

4. Usage

The FMD Tool is accessed via the fmd command, which provides subcommands for analyzing different media types. Below are the available commands and their options.

3.1. General Options

--version: Show the version and exit.

3.2. Image Analysis (`fmd image`)

This command analyzes image files for deepfake artifacts and manipulation.

Usage:

fmd image --check <path_to_image_file> [OPTIONS]

Options:

--check <path_to_image_file>: (Required) Path to the image file to analyze.
--model <model_name>: Model to use for analysis. Supported models: xception (default), autoencoder.
--detect <detection_type>: Type of detection to perform. Default: deepfake.
--output <output_file>: Path to save the analysis results in JSON format.

Example:

fmd image --check my_image.jpg --model xception --output results.json

3.3. Video Analysis (`fmd video`)

This command analyzes video files for deepfake artifacts and manipulation.

Usage:

fmd video --check <path_to_video_file> [OPTIONS]

Options:

--check <path_to_video_file>: (Required) Path to the video file to analyze.
--model <model_name>: Model to use for analysis. Supported models: cnn_lstm (default).
--detect <detection_type>: Type of detection to perform. Default: deepfake.
--output <output_file>: Path to save the analysis results in JSON format.

Example:

fmd video --check my_video.mp4 --model cnn_lstm

3.4. Audio Analysis (`fmd audio`)

This command analyzes audio files for synthetic voice and manipulation.

Usage:

fmd audio --check <path_to_audio_file> [OPTIONS]

Options:

--check <path_to_audio_file>: (Required) Path to the audio file to analyze.
--model <model_name>: Model to use for analysis. Supported models: xvector (default), cnn_lstm.
--detect <detection_type>: Type of detection to perform. Default: deepfake.
--output <output_file>: Path to save the analysis results in JSON format.

Example:

fmd audio --check my_audio.wav --model xvector

3.5. Multimodal Analysis (`fmd multimodal`)

This command performs a comprehensive analysis across multiple modalities (image, video, audio) if available in the media file.

Usage:

fmd multimodal --check <path_to_media_file> [OPTIONS]

Options:

--check <path_to_media_file>: (Required) Path to the media file to analyze.
--model <model_name>: Model to use for fusion. Supported models: fusion_model (default).
--output <output_file>: Path to save the analysis results in JSON format.

Example:

fmd multimodal --check my_media.mp4

3.6. Information (`fmd info`)

This command displays information about the FMD tool, including supported analysis types and usage examples.

Usage:

fmd info

Example:

fmd info

5. Output Interpretation

The FMD tool provides detailed analysis results, typically in JSON format when the --output option is used. The output includes:

Deepfake Probability: A score indicating the likelihood of the media being a deepfake.
Specific Anomalies: Details about detected inconsistencies in pixels, lighting, frame-by-frame analysis, lip-sync, voice mismatch, and synthetic speech artifacts.
Overall Assessment: A summary recommendation (e.g., "DEEPFAKE DETECTED", "SUSPICIOUS", "LIKELY AUTHENTIC").

6. Troubleshooting

"File not found" error: Ensure the path to your media file is correct and the file exists.
Dependency issues: If you encounter errors related to missing libraries, try reinstalling them using pip install -r requirements.txt (if a requirements.txt is provided) or the individual pip install commands listed in the Installation section.
Model loading errors: Ensure you have sufficient memory and the correct TensorFlow/PyTorch versions installed. Some models might require pre-trained weights, which may need to be downloaded separately.

For further assistance, please refer to the project's documentation or contact support.

Name		Name	Last commit message	Last commit date
Latest commit History 347 Commits
.github/workflows		.github/workflows
assets		assets
audio_forensics		audio_forensics
cli		cli
data		data
documentation		documentation
image_forensics		image_forensics
multimodal_analysis		multimodal_analysis
scripts		scripts
tests		tests
video_forensics		video_forensics
.bumpversion.cfg		.bumpversion.cfg
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
CONTRIBUTION_PR_DRAFT.md		CONTRIBUTION_PR_DRAFT.md
Feature Implementation: Frequency-Domain Preprocessing and Multimodal Fusion.md		Feature Implementation: Frequency-Domain Preprocessing and Multimodal Fusion.md
LICENSE		LICENSE
MIT License.md		MIT License.md
README.md		README.md
audio_visual_verifier.py		audio_visual_verifier.py
contribution.patch		contribution.patch
multimodal_detector.py		multimodal_detector.py
requirements.txt		requirements.txt
test_multimodal_verifier.py		test_multimodal_verifier.py
test_wavelet_integration.py		test_wavelet_integration.py
train_xception.py		train_xception.py
wavelet_preprocessing.py		wavelet_preprocessing.py

Folders and files

Latest commit

History

Repository files navigation

Forensic Media Detection (FMD) Tool - Technical Documentation

1. Overview

2. System Architecture

3. Image Forensics Module

3.1. Architecture

3.2. Key Features

3.3. Model Details

4. Video Forensics Module

4.1. Architecture

4.2. Key Features

4.3. Model Details

5. Audio Forensics Module

5.1. Architecture

5.2. Key Features

5.3. Model Details

6. Multimodal Analysis Module

6.1. Architecture

6.2. Key Features

6.3. Fusion Model Details

7. Command Line Interface

7.1. Design

7.2. Implementation Details

8. Testing Framework

8.1. Unit Tests

8.2. Integration Tests

9. Performance Considerations

9.1. Optimization Strategies

9.2. Scalability

10. Supported Pretrained Models

Image Forensics

Video Forensics

Audio Forensics

Multimodal Fusion

11. Dependencies

10.1. Core Libraries

10.2. CLI Libraries

11. Future Enhancements

11.1. Planned Features

11.2. Research Directions

Forensic Media Detection (FMD) Tool - User Manual

1. Introduction

2. Installation and Training (Lightweight Focus)

2.1. Installation

2.2. Dataset Download and Preparation

2.3. Model Training

4. Usage

3.1. General Options

3.2. Image Analysis (fmd image)

3.3. Video Analysis (fmd video)

3.4. Audio Analysis (fmd audio)

3.5. Multimodal Analysis (fmd multimodal)

3.6. Information (fmd info)

5. Output Interpretation

6. Troubleshooting

Insights

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

3.2. Image Analysis (`fmd image`)

3.3. Video Analysis (`fmd video`)

3.4. Audio Analysis (`fmd audio`)

3.5. Multimodal Analysis (`fmd multimodal`)

3.6. Information (`fmd info`)

Packages