A machine learning-based Android malware detection system that analyzes APK files to classify them as Malware or Safe.
- Overview
- Features
- Project Structure
- Workflow
- Installation
- Usage
- Dataset
- Model Training
- Technologies Used
ThreatScope is an advanced malware analysis platform that uses machine learning to detect malicious Android applications. The system extracts features from APK files using tools like APKiD and apktool, then classifies them using a trained model.
- APK Upload & Analysis: Simple web interface to upload and analyze APK files
- Feature Extraction: Automatically extracts 215 features including:
- Android permissions
- Obfuscation detection
- Anti-VM techniques
- Packer detection
- Anti-debugging mechanisms
- Dropper detection
- ML-Based Classification: Uses trained SGD/Random Forest classifier
- Web Interface: Modern, responsive UI for easy interaction
Malware Analysis/
│
├── app.py # Flask web application
├── malware_model.pkl # Trained ML model
│
├── Dataset/
│ └── Base DS/
│ ├── datasetfeaturescategories.csv
│ └── drebin215dataset5560malware9476benign.csv
│
├── Malware Analysis/
│ └── NoteBook's/
│ └── Malware.ipynb # Model training notebook
│
├── NoteBook's/
│ └── Malware.ipynb # Alternative notebook
│
└── templates/
└── index.html # Web UI template
Dataset Loading → Data Cleaning → Feature Engineering → Train-Test Split
graph LR
A[Load Dataset] --> B[Clean Data]
B --> C[Encode Labels]
C --> D[Split Data 80/20]
D --> E[Train Model]
E --> F[Evaluate]
F --> G[Save Model]
Steps:
- Load the Drebin dataset (5,560 malware + 9,476 benign samples)
- Convert class labels (
S= Malware → 1,B= Benign → 0) - Clean data (handle missing values, type conversions)
- Split into training (80%) and testing (20%) sets
- Train SGDClassifier or RandomForestClassifier
- Evaluate accuracy (target: >95%)
- Save model as
malware_model.pkl
APK Upload → APKiD Analysis → APKTool Decode → Permission Extraction → Feature Vector → ML Prediction → Result
Steps:
- User uploads APK file via web interface
- APKiD scans for:
- Obfuscation techniques
- Anti-VM protections
- Packers
- Anti-debugging
- Droppers
- apktool decodes
AndroidManifest.xml - Extract all permissions from manifest
- Create 215-feature vector matching dataset columns
- Load trained model and predict
- Return classification result (Malware/Safe)
- Python 3.8+
- Java JDK (for apktool)
- apktool installed at
C:\apktool\apktool.bat - APKiD installed via pip
# Clone the repository
git clone https://github.com/AnshGajera/Malware-Analysis.git
cd Malware-Analysis
# Install Python dependencies
pip install flask pandas scikit-learn joblib numpy matplotlib
# Install APKiD
pip install apkid
# Install apktool (Windows)
# Download from https://apktool.org/ and place in C:\apktool\cd "Malware Analysis"
python app.pyThen open your browser and navigate to http://127.0.0.1:5000
- Open the web interface
- Click the upload area
- Select an APK file
- Wait for analysis to complete
- View the prediction result (Malware/Safe)
The project uses the Drebin Dataset with 215 features:
| Metric | Count |
|---|---|
| Total Samples | 15,036 |
| Malware Samples | 5,560 |
| Benign Samples | 9,476 |
| Features | 215 |
Feature Categories:
- Android permissions
- API calls
- Network activities
- Hardware features
- Intent filters
-
SGDClassifier (Stochastic Gradient Descent)
- Loss:
log_loss - Max iterations: 1000
- Early stopping enabled
- Loss:
-
RandomForestClassifier
- Estimators: 100
- Random state: 42
- Target Accuracy: >95%
- Evaluation includes:
- Precision
- Recall
- F1-Score
- Classification Report
Open Malware Analysis/NoteBook's/Malware.ipynb and run all cells to:
- Load and preprocess data
- Train the classifier
- Evaluate performance
- Save the model
| Category | Technology |
|---|---|
| Backend | Flask (Python) |
| ML/AI | scikit-learn, pandas, numpy |
| APK Analysis | APKiD, apktool |
| Frontend | HTML5, CSS3, JavaScript |
| Model Serialization | joblib |
| Endpoint | Method | Description |
|---|---|---|
/ |
GET | Home page with upload interface |
/upload |
POST | Upload APK and get prediction |
/favicon.ico |
GET | Favicon handler |
Ansh Gajera
- GitHub: @AnshGajera
This project is for educational purposes.
This tool is intended for educational and research purposes only. Always ensure you have proper authorization before analyzing any applications.