This project demonstrates a lightweight, real-time deepfake audio detection system combining Retrieval-Augmented Detection (RADD), Generative Adversarial Networks (GANs), and Variational Autoencoders (VAEs). It processes audio live from a microphone, identifies potential deepfakes, and adapts to new threats through continuous learning. Built for quick testing, it uses 100 real and 100 fake audio samples from the ASVspoof2019 LA dataset and runs a 10-second real-time detection demo.
Read the Complete Document for more details.
- Real-Time Detection: Analyzes audio in 128-ms chunks for instant deepfake flagging.
- Adaptability: Uses GANs to simulate new fakes and a database to update itself.
- Robustness: VAEs enhance data with variations, preparing for real-world conditions.
- Speed: Caches retrieval results for efficiency, inspired by arXiv:2403.11778.
- Call center fraud prevention
- Social media audio verification
- Legal audio authenticity checks
- ASVspoof2019 LA: Place
ASVspoof2019_LA_train/flac/files in aflac/folder andASVspoof2019.LA.cm.train.trn.txtin the root directory.- Download from ASVspoof Challenge.
- Python: Version 3.9
- Dependencies: Install via pip:
pip install librosa numpy faiss-cpu transformers torch tensorflow pyaudio mysql-connector-python scikit-learn
- Hardware: Microphone required for real-time detection.
-
Clone the Repository:
git clone <repository-url> cd deepfake-audio-detection
-
Prepare Dataset:
- Copy
flac/folder andASVspoof2019.LA.cm.train.trn.txtinto the project directory.
- Copy
-
Install Dependencies:
pip install -r requirements.txt # Create this file with the above dependencies if needed -
Database Configuration:
- Update
DB_CONFIGin the notebook with your MySQL credentials if using a different setup:DB_CONFIG = { "host": "your-host", "port": your-port, "user": "your-user", "password": "your-password", "database": "your-database" }
- Update
-
Open the Notebook:
jupyter notebook Deepfake_Audio_Detection_Demo.ipynb
-
Run All Cells:
- Execute sequentially to:
- Load and preprocess 200 audio files (100 real, 100 fake).
- Train GAN (50 epochs) and VAE (10 epochs).
- Set up RADD and train the detector (50 epochs).
- Run a 10-second real-time detection test with your microphone.
- Execute sequentially to:
-
Check Outputs:
- Logs show training progress, real-time probabilities (e.g., “Deepfake Probability: 0.50”), and evaluation metrics (e.g., “Accuracy: 0.50”).
- Data Loading: Reads audio, converts to spectrograms.
- RADD: Compares audio to a feature library using Wav2Vec2 and FAISS.
- GAN: Generates synthetic deepfakes for training.
- VAE: Augments real audio for robustness.
- Detector: Classifies audio in real time with a CNN.
- Continuous Learning: Stores new fakes in MySQL and retrains.
- Training: Detector accuracy ~0.50–0.56, validation stuck at 0.50 (needs more data/epochs).
- Real-Time: Outputs ~0.50 probability (undecisive, requires tuning).
- Metrics: Accuracy 0.50, Recall 1.00, F1 0.67—promising but improvable.
- Increase dataset size (
DEMO_FILES > 100). - Use a lighter model (e.g., Wav2Vec2-small) for speed.
- Add platform-specific artifacts (compression, noise) to VAE.
- Extend real-time detection to continuous monitoring (
while True).
- Tak, H., et al. (2024). "Real-Time Deepfake Detection Using Retrieval-Augmented Methods." arXiv:2403.11778. Link
- Goodfellow, I., et al. (2014). "Generative Adversarial Nets." Advances in Neural Information Processing Systems. Link
- Kingma, D. P., & Welling, M. (2013). "Auto-Encoding Variational Bayes." arXiv:1312.6114. Link
This project is licensed under the MIT License. See LICENSE for details.
Feel free to open issues or submit pull requests with improvements!