A machine learning system for detecting AI-generated audio using Benford's Law analysis and advanced audio feature extraction. The system employs ensemble learning with adaptive model updating capabilities.
- Multi-Model Ensemble: Uses Random Forest, Gradient Boosting, SGD, and Passive Aggressive classifiers
- Benford's Law Analysis: Analyzes frequency distributions for AI detection patterns
- Comprehensive Audio Features: Extracts spectral, temporal, and compression-related features
- Adaptive Learning: Supports incremental model updates with new data
- Batch Processing: Parallel processing for large audio datasets
- Spectrogram Generation: Creates and compares various types of spectrograms
- Interactive CLI: User-friendly command-line interface
- WAV (.wav)
- MP3 (.mp3)
- FLAC (.flac)
- OGG (.ogg)
- M4A (.m4a)
- AAC (.aac)
pip install aiaa- Clone the repository:
git clone https://github.com/ajprice16/AI_Audio_Detection.git
cd AI_Audio_Detection- Install dependencies:
pip install -r requirements.txtOn Ubuntu/Debian:
sudo apt-get install libsndfile1 ffmpegOn macOS:
brew install libsndfile ffmpeg-
Prepare your data: Organize your audio files into two directories:
human_audio/- Human-generated audio filesai_audio/- AI-generated audio files
-
Run the detector:
If installed from PyPI:
aiaa --interactive
# or
aiaa --predict-file path/to/audio.wavIf running from source:
python -m aiaa --interactive
# or
python -m aiaa --predict-file path/to/audio.wav- Choose option 1 to train new models and follow the prompts.
Train models:
aiaa --train --human-dir path/to/human/audio --ai-dir path/to/ai/audioPredict single file:
aiaa --predict-file path/to/audio.wavPredict batch:
aiaa --predict-batch path/to/audio/directoryInteractive mode:
aiaa --interactiveInteractive mode:
aiaa --interactive
# Choose option 2 and enter the path to your audio fileDirect command:
aiaa --predict-file path/to/audio.wavInteractive mode:
aiaa --interactive
# Choose option 3 and enter the directory pathDirect command:
aiaa --predict-batch path/to/audio/directoryfrom aiaa import AIAudioDetector
from pathlib import Path
# Initialize detector
detector = AIAudioDetector(base_dir=Path.cwd())
# Train models
human_features = detector.extract_features_from_directory("human_audio/", is_ai_directory=False)
ai_features = detector.extract_features_from_directory("ai_audio/", is_ai_directory=True)
all_features = human_features + ai_features
df_results = pd.DataFrame(all_features)
training_results = detector.train_models(df_results)
# Make predictions
result = detector.predict_file("test_audio.wav")
print(f"Prediction: {'AI' if result['is_ai'] else 'Human'}")
print(f"Confidence: {result['confidence']:.3f}")The system supports adaptive learning to improve accuracy with new data:
# Add new AI data
detector.add_ai_data("new_ai_audio/", retrain_batch_models=True)
# Add new human data
detector.add_human_data("new_human_audio/", retrain_batch_models=True)
# Add mixed data batch
directories = [
{'path': 'dataset1/', 'is_ai': True},
{'path': 'dataset2/', 'is_ai': False}
]
detector.add_mixed_data_batch(directories, retrain_batch_models=True)- Chi-square test statistics
- Kolmogorov-Smirnov test statistics
- Mean absolute deviation from expected distribution
- Maximum deviation
- Entropy measures
- Spectral centroid, bandwidth, rolloff
- MFCCs (13 coefficients + standard deviations)
- Chroma features
- Spectral contrast
- Zero crossing rate
- RMS energy (mean and standard deviation)
- Tempo estimation
- Spectral flatness
- Dynamic range
- Peak-to-RMS ratio
- Estimated bit depth
- Clipping detection
- DC offset
- High frequency content ratio
The system uses an ensemble of four different models:
-
Incremental Models (for adaptive learning):
- SGD Classifier with log loss
- Passive Aggressive Classifier
-
Batch Models (for maximum accuracy):
- Random Forest (200 estimators)
- Gradient Boosting (200 estimators)
All features are standardized using StandardScaler, and final predictions use ensemble averaging.
Modify config.yaml to customize:
- Model parameters
- Feature extraction settings
- Processing options
- Output directories
- Train new models - Initial training from audio directories
- Predict single file - Analyze one audio file
- Predict batch - Analyze all files in a directory
- Update models - Adaptive learning with new data
- Add AI data - Add new AI samples to existing models
- Add Human data - Add new human samples to existing models
- Batch directories - Add multiple directories at once
- Training history - View model training history
- Data balance - Check AI vs Human data balance
- Create visualizations - Generate analysis plots
- Generate spectrograms - Create spectrograms for audio files
- Spectrogram comparison - Compare AI vs Human spectrograms
models/aiaa.joblib- Trained models and metadatatraining_results.csv- Detailed training data and featuresai_detection_analysis.png- Visualization plotsspectrograms/- Generated spectrogram imagesspectrogram_comparisons/- Side-by-side comparisons
- Multiprocessing: Automatically used for batches > 3 files
- Memory Management: Spectrograms are generated efficiently with proper cleanup
- Scalability: Incremental learning allows handling large datasets over time
- Python 3.7+
- librosa (audio processing)
- scikit-learn (machine learning)
- pandas, numpy (data manipulation)
- matplotlib (visualization)
- scipy (statistical tests)
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
- Uses Benford's Law for detecting artificial patterns in audio
- Built on librosa for robust audio feature extraction
- Employs scikit-learn for machine learning capabilities
If you use this work in your research, please cite:
@software{aiaa,
title={AIAA: AI Audio Authenticity - Machine Learning System for Detecting AI-Generated Audio},
author={Alex Price},
year={2025},
url={https://github.com/ajprice16/AI_Audio_Detection}
}