ThreatScope - Android Malware Analysis Platform

A machine learning-based Android malware detection system that analyzes APK files to classify them as Malware or Safe.

📋 Table of Contents

Overview
Features
Project Structure
Workflow
Installation
Usage
Dataset
Model Training
Technologies Used

🔍 Overview

ThreatScope is an advanced malware analysis platform that uses machine learning to detect malicious Android applications. The system extracts features from APK files using tools like APKiD and apktool, then classifies them using a trained model.

✨ Features

APK Upload & Analysis: Simple web interface to upload and analyze APK files
Feature Extraction: Automatically extracts 215 features including:
- Android permissions
- Obfuscation detection
- Anti-VM techniques
- Packer detection
- Anti-debugging mechanisms
- Dropper detection
ML-Based Classification: Uses trained SGD/Random Forest classifier
Web Interface: Modern, responsive UI for easy interaction

📁 Project Structure

Malware Analysis/
│
├── app.py                          # Flask web application
├── malware_model.pkl               # Trained ML model
│
├── Dataset/
│   └── Base DS/
│       ├── datasetfeaturescategories.csv
│       └── drebin215dataset5560malware9476benign.csv
│
├── Malware Analysis/
│   └── NoteBook's/
│       └── Malware.ipynb           # Model training notebook
│
├── NoteBook's/
│   └── Malware.ipynb               # Alternative notebook
│
└── templates/
    └── index.html                  # Web UI template

🔄 Workflow

1. Data Preparation

Dataset Loading → Data Cleaning → Feature Engineering → Train-Test Split

2. Model Training Pipeline

graph LR
    A[Load Dataset] --> B[Clean Data]
    B --> C[Encode Labels]
    C --> D[Split Data 80/20]
    D --> E[Train Model]
    E --> F[Evaluate]
    F --> G[Save Model]

Steps:

Load the Drebin dataset (5,560 malware + 9,476 benign samples)
Convert class labels (S = Malware → 1, B = Benign → 0)
Clean data (handle missing values, type conversions)
Split into training (80%) and testing (20%) sets
Train SGDClassifier or RandomForestClassifier
Evaluate accuracy (target: >95%)
Save model as malware_model.pkl

3. APK Analysis Pipeline

APK Upload → APKiD Analysis → APKTool Decode → Permission Extraction → Feature Vector → ML Prediction → Result

Steps:

User uploads APK file via web interface
APKiD scans for:
- Obfuscation techniques
- Anti-VM protections
- Packers
- Anti-debugging
- Droppers
apktool decodes AndroidManifest.xml
Extract all permissions from manifest
Create 215-feature vector matching dataset columns
Load trained model and predict
Return classification result (Malware/Safe)

🛠️ Installation

Prerequisites

Python 3.8+
Java JDK (for apktool)
apktool installed at C:\apktool\apktool.bat
APKiD installed via pip

Setup

# Clone the repository
git clone https://github.com/AnshGajera/Malware-Analysis.git
cd Malware-Analysis

# Install Python dependencies
pip install flask pandas scikit-learn joblib numpy matplotlib

# Install APKiD
pip install apkid

# Install apktool (Windows)
# Download from https://apktool.org/ and place in C:\apktool\

🚀 Usage

Running the Web Application

cd "Malware Analysis"
python app.py

Then open your browser and navigate to http://127.0.0.1:5000

Analyzing an APK

Open the web interface
Click the upload area
Select an APK file
Wait for analysis to complete
View the prediction result (Malware/Safe)

📊 Dataset

The project uses the Drebin Dataset with 215 features:

Metric	Count
Total Samples	15,036
Malware Samples	5,560
Benign Samples	9,476
Features	215

Feature Categories:

Android permissions
API calls
Network activities
Hardware features
Intent filters

🧠 Model Training

Algorithms Used

SGDClassifier (Stochastic Gradient Descent)
- Loss: log_loss
- Max iterations: 1000
- Early stopping enabled
RandomForestClassifier
- Estimators: 100
- Random state: 42

Performance Metrics

Target Accuracy: >95%
Evaluation includes:
- Precision
- Recall
- F1-Score
- Classification Report

Training the Model

Open Malware Analysis/NoteBook's/Malware.ipynb and run all cells to:

Load and preprocess data
Train the classifier
Evaluate performance
Save the model

🔧 Technologies Used

Category	Technology
Backend	Flask (Python)
ML/AI	scikit-learn, pandas, numpy
APK Analysis	APKiD, apktool
Frontend	HTML5, CSS3, JavaScript
Model Serialization	joblib

📝 API Endpoints

Endpoint	Method	Description
`/`	GET	Home page with upload interface
`/upload`	POST	Upload APK and get prediction
`/favicon.ico`	GET	Favicon handler

👥 Author

Ansh Gajera

GitHub: @AnshGajera

📄 License

This project is for educational purposes.

⚠️ Disclaimer

This tool is intended for educational and research purposes only. Always ensure you have proper authorization before analyzing any applications.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Malware Analysis		Malware Analysis
Malware.txt		Malware.txt
Project-II Project Problem Statement Submission Format.docx		Project-II Project Problem Statement Submission Format.docx
README.md		README.md
malware_model.pkl		malware_model.pkl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ThreatScope - Android Malware Analysis Platform

📋 Table of Contents

🔍 Overview

✨ Features

📁 Project Structure

🔄 Workflow

1. Data Preparation

2. Model Training Pipeline

3. APK Analysis Pipeline

🛠️ Installation

Prerequisites

Setup

🚀 Usage

Running the Web Application

Analyzing an APK

📊 Dataset

🧠 Model Training

Algorithms Used

Performance Metrics

Training the Model

🔧 Technologies Used

📝 API Endpoints

👥 Author

📄 License

⚠️ Disclaimer

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ThreatScope - Android Malware Analysis Platform

📋 Table of Contents

🔍 Overview

✨ Features

📁 Project Structure

🔄 Workflow

1. Data Preparation

2. Model Training Pipeline

3. APK Analysis Pipeline

🛠️ Installation

Prerequisites

Setup

🚀 Usage

Running the Web Application

Analyzing an APK

📊 Dataset

🧠 Model Training

Algorithms Used

Performance Metrics

Training the Model

🔧 Technologies Used

📝 API Endpoints

👥 Author

📄 License

⚠️ Disclaimer

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages