The Automaton Engine: Building ML Shields for Modern Threat Detection
Features • Installation • Usage • Architecture • Documentation
Talos MD5 is a next-generation neural security intelligence platform that transforms raw threat data into actionable defense mechanisms. Leveraging state-of-the-art machine learning algorithms, Talos MD5 empowers security researchers, threat analysts, and defensive teams to detect, classify, and neutralize malicious patterns with unprecedented precision.
Built on a foundation of Python 3.11, Scikit-Learn Random Forest, and CustomTkinter, Talos MD5 bridges the gap between academic ML research and real-world threat hunting operations.
|
MODEL |
THREAT |
DATA |
|
Accuracy |
Precision |
Recall |
|
F1-Score |
Threats Detected |
|
- ⚡ Real-time Threat Detection - Analyze files in milliseconds
- 🧠 Advanced ML Algorithms - Random Forest, SVM, Neural Networks
- 🎨 Intuitive Interface - Modern CustomTkinter GUI
- 🔄 Automated Pipeline - From data ingestion to deployment
- 📊 Comprehensive Analytics - Detailed metrics and visualizations
- 🚀 Production Ready - Battle-tested in live environments
|
|
- Multi-threaded Processing - Parallel training & inference
- Memory Efficient - Handles 1M+ samples with lazy loading
- GPU Acceleration - Optional CUDA support
- Incremental Learning - Update models without full retraining
- Model Compression - Optimized for deployment
Talos MD5 includes an intelligent setup handler that configures your environment automatically:
# Clone the repository
git clone https://github.com/ghosthets/talos-md5.git
cd talos-md5
# Run automated setup
python setup.pyWhat happens automatically:
- ✅ Verifies Python 3.11+ installation
- ✅ Creates isolated virtual environment (
.venv/) - ✅ Installs all dependencies from
requirements.txt - ✅ Validates installation integrity
- ✅ Launches Talos Engine (
talos.py)
For advanced users who prefer manual control:
# Create virtual environment
python3.11 -m venv .venv
# Activate environment
# Windows:
.venv\Scripts\activate
# Linux/Mac:
source .venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Launch Talos
python talos.py| Component | Minimum | Recommended |
|---|---|---|
| Python | 3.11.0 | 3.11.5+ |
| RAM | 4 GB | 8 GB+ |
| Storage | 500 MB | 2 GB+ |
| OS | Windows 10, Linux, macOS | Any 64-bit |
|
|
||
|
Raw Input Processing |
Transform & Extract |
ML Algorithm Processing |
|
Components:
- JSON Parser |
Components:
- Vectorizer |
Algorithms:
- Random Forest |
|
|
||
|
Instant threat detection |
Multiple file analysis |
RESTful endpoints |
|
Data Collection Load datasets from |
Preprocessing Clean & validate |
Feature Extraction Generate ML-ready |
Model Inference Predict threat |
|
Primary Classifier |
Secondary Classifier |
Gradient Boosting |
|
INGESTION |
ENGINEERING |
TRAINING |
|
• JSON Parser |
• Vectorizer |
• Random Forest |
|
Real-time Prediction • Batch Processing • API Server |
||
📂 Data Processing Layer
- Input Formats: JSON, CSV, TXT, Binary
- Preprocessing: Cleaning, normalization, deduplication
- Validation: Schema validation, integrity checks
- Storage: Efficient serialization with Pickle/Joblib
🧠 Machine Learning Core
- Training Pipeline: GridSearchCV, K-Fold validation
- Model Types: Classification, anomaly detection
- Optimization: Hyperparameter tuning, feature selection
- Evaluation: Confusion matrix, ROC curves, PR curves
🎨 User Interface
- Framework: CustomTkinter (modern, themeable)
- Features: Real-time dashboards, progress bars, charts
- Themes: Dark mode, light mode, custom themes
- Responsive: Scales to different screen sizes
talos-md5/
│
├── 📁 scripts/ # Core Logic
│ ├── train.py # Model training orchestrator
│ ├── inference.py # Prediction engine
│ ├── preprocessing.py # Data pipeline
│ ├── feature_eng.py # Feature extraction
│ ├── evaluation.py # Model metrics
│ └── utils.py # Helper functions
│
├── 📁 data/ # Intelligence Repository
│ ├── raw/
│ │ ├── malicious.json # Threat samples
│ │ └── benign.json # Clean samples
│ ├── processed/
│ │ └── features.pkl # Engineered features
│ └── splits/
│ ├── train.pkl # Training set (80%)
│ ├── val.pkl # Validation set (10%)
│ └── test.pkl # Test set (10%)
│
├── 📁 models/ # Neural Arsenal
│ ├── production/
│ │ └── talos_v1.pkl # Deployed model
│ ├── experiments/
│ │ ├── rf_exp1.pkl # Random Forest experiments
│ │ └── svm_exp1.pkl # SVM experiments
│ └── checkpoints/
│ └── best_model.pkl # Best performing model
│
├── 📁 logs/ # System Logs
│ ├── training.log # Training history
│ ├── inference.log # Prediction logs
│ └── error.log # Error tracking
│
├── 📁 config/ # Configuration
│ ├── settings.yaml # Global settings
│ └── model_config.json # Model parameters
│
├── 📄 talos.py # Main GUI Application
├── 📄 setup.py # Automated installer
├── 📄 requirements.txt # Dependencies
├── 📄 LICENSE # Apache 2.0
└── 📄 README.md # This file
# Launch Talos GUI
python talos.py
# Or use CLI
python scripts/train.py --data data/raw/dataset.json \
--model random_forest \
--output models/my_model.pklTraining Parameters:
--algorithm:rf,svm,xgboost,mlp--cv-folds: Cross-validation folds (default: 5)--optimize: Enable hyperparameter tuning--gpu: Enable GPU acceleration
# Predict single file
python scripts/inference.py --model models/talos_v1.pkl \
--input suspicious_file.exe
# Batch prediction
python scripts/inference.py --model models/talos_v1.pkl \
--batch data/samples/ \
--output results.csvOutput Format:
{
"file": "suspicious_file.exe",
"prediction": "MALICIOUS",
"confidence": 0.987,
"threat_score": 94.2,
"features": {
"entropy": 7.89,
"file_size": 2048000,
"signature": "unknown"
}
}# Evaluate model performance
python scripts/evaluation.py --model models/talos_v1.pkl \
--testdata data/splits/test.pkl
# Generate visualizations
python scripts/evaluation.py --model models/talos_v1.pkl \
--visualize --output reports/| Model | Accuracy | Precision | Recall | F1-Score | Training Time |
|---|---|---|---|---|---|
| Random Forest | 98.7% | 97.3% | 99.1% | 98.2% | 12.4s |
| SVM (RBF) | 96.4% | 95.1% | 97.8% | 96.4% | 45.2s |
| XGBoost | 98.1% | 96.8% | 98.9% | 97.8% | 18.7s |
| Neural Network | 97.2% | 95.9% | 98.3% | 97.1% | 67.3s |
Test Environment: Intel i7-11700K, 32GB RAM, Dataset: 100K samples
Predicted
Benign Malicious
Actual Benign 4892 63
Malicious 45 4998
from scripts.train import TalosTrainer
# Initialize trainer
trainer = TalosTrainer(
algorithm='random_forest',
n_estimators=200,
max_depth=15,
min_samples_split=5
)
# Load data
trainer.load_data('data/raw/dataset.json')
# Train with cross-validation
trainer.train(cv=5, optimize=True)
# Save model
trainer.save('models/custom_model.pkl')from scripts.feature_eng import FeatureExtractor
# Create extractor
extractor = FeatureExtractor()
# Add custom features
extractor.add_feature('file_entropy')
extractor.add_feature('pe_headers')
extractor.add_feature('import_table')
# Extract features
features = extractor.extract('malware.exe')We welcome contributions! See CONTRIBUTING.md for guidelines.
# Fork and clone
git clone https://github.com/YOUR_USERNAME/talos-md5.git
cd talos-md5
# Create feature branch
git checkout -b feature/amazing-feature
# Install dev dependencies
pip install -r requirements-dev.txt
# Run tests
pytest tests/
# Submit PR
git push origin feature/amazing-featureTalos MD5 is distributed under the Apache License 2.0.
This software is created for educational and defensive cybersecurity purposes only.
The author (@Ghosthets) is NOT responsible for:
- Any misuse, damage, or illegal activities conducted with this tool
- Unauthorized access to computer systems
- Violation of applicable laws or regulations
Users must:
- Comply with all local, state, and federal laws
- Only use on systems they own or have explicit permission to test
- Use responsibly and ethically
- Scikit-Learn Team - ML framework
- CustomTkinter - Modern GUI library
- Security Community - Threat intelligence datasets
- Contributors - All project contributors
