The Integrated Deepfake Detection System is a comprehensive project aimed at detecting deepfake videos by analyzing spatial, temporal, and micro-expression features. The system utilizes state-of-the-art deep learning models, including Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs), to extract and fuse features for accurate deepfake detection.
This project is developed as part of a submission to SIH 2024 (Smart India Hackathon) and aims to push the boundaries of automated deepfake detection techniques.
With the rise of deepfake technology, the ability to manipulate video content has become increasingly sophisticated, posing serious threats to privacy, security, and authenticity. This project addresses the need for robust deepfake detection mechanisms that can identify manipulated content by analyzing various aspects of video data.
The project is structured into several key components:
- Data Preprocessing: Prepares video frames for feature extraction.
- Spatial Feature Extraction: Uses pre-trained CNN models to extract spatial features.
- Temporal Feature Extraction: Utilizes BiLSTM networks to capture temporal dependencies across video frames.
- Micro-Expression Analysis: Analyzes subtle facial movements to detect inconsistencies indicative of deepfakes.
- Feature Fusion Layer: Integrates spatial, temporal, and micro-expression features for final decision-making.
- Output: Generates a report indicating whether the video is a deepfake or genuine.
The system is trained and tested using the FaceForensics++ dataset, which contains both original and manipulated video sequences. The dataset is organized into folders for 'original sequences' and 'manipulated sequences', providing a rich source of data for training and evaluation.
- Model Used: ResNet50
- Purpose: Extract spatial features from each frame of the video, capturing detailed facial features.
- Implementation: Utilizes pre-trained ResNet50 model, with additional custom layers to refine feature extraction.
- Model Used: BiLSTM (Bidirectional Long Short-Term Memory)
- Purpose: Capture temporal dependencies across frames, analyzing how features change over time.
- Implementation: A sequence of feature vectors is fed into BiLSTM layers, followed by attention mechanisms to focus on significant temporal patterns.
- Model Used: Custom CNN
- Purpose: Detect subtle micro-expressions that are hard to manipulate in deepfake videos.
- Implementation: A dedicated CNN model extracts fine-grained facial movements, which are then analyzed for inconsistencies.
- Purpose: Integrate spatial, temporal, and micro-expression features to form a comprehensive feature set.
- Implementation: Features from different modules are concatenated and processed through dense layers for final classification.