Skip to content

AnshGajera/Malware-Analysis

Repository files navigation

ThreatScope - Android Malware Analysis Platform

A machine learning-based Android malware detection system that analyzes APK files to classify them as Malware or Safe.


📋 Table of Contents


🔍 Overview

ThreatScope is an advanced malware analysis platform that uses machine learning to detect malicious Android applications. The system extracts features from APK files using tools like APKiD and apktool, then classifies them using a trained model.


✨ Features

  • APK Upload & Analysis: Simple web interface to upload and analyze APK files
  • Feature Extraction: Automatically extracts 215 features including:
    • Android permissions
    • Obfuscation detection
    • Anti-VM techniques
    • Packer detection
    • Anti-debugging mechanisms
    • Dropper detection
  • ML-Based Classification: Uses trained SGD/Random Forest classifier
  • Web Interface: Modern, responsive UI for easy interaction

📁 Project Structure

Malware Analysis/
│
├── app.py                          # Flask web application
├── malware_model.pkl               # Trained ML model
│
├── Dataset/
│   └── Base DS/
│       ├── datasetfeaturescategories.csv
│       └── drebin215dataset5560malware9476benign.csv
│
├── Malware Analysis/
│   └── NoteBook's/
│       └── Malware.ipynb           # Model training notebook
│
├── NoteBook's/
│   └── Malware.ipynb               # Alternative notebook
│
└── templates/
    └── index.html                  # Web UI template

🔄 Workflow

1. Data Preparation

Dataset Loading → Data Cleaning → Feature Engineering → Train-Test Split

2. Model Training Pipeline

graph LR
    A[Load Dataset] --> B[Clean Data]
    B --> C[Encode Labels]
    C --> D[Split Data 80/20]
    D --> E[Train Model]
    E --> F[Evaluate]
    F --> G[Save Model]
Loading

Steps:

  1. Load the Drebin dataset (5,560 malware + 9,476 benign samples)
  2. Convert class labels (S = Malware → 1, B = Benign → 0)
  3. Clean data (handle missing values, type conversions)
  4. Split into training (80%) and testing (20%) sets
  5. Train SGDClassifier or RandomForestClassifier
  6. Evaluate accuracy (target: >95%)
  7. Save model as malware_model.pkl

3. APK Analysis Pipeline

APK Upload → APKiD Analysis → APKTool Decode → Permission Extraction → Feature Vector → ML Prediction → Result

Steps:

  1. User uploads APK file via web interface
  2. APKiD scans for:
    • Obfuscation techniques
    • Anti-VM protections
    • Packers
    • Anti-debugging
    • Droppers
  3. apktool decodes AndroidManifest.xml
  4. Extract all permissions from manifest
  5. Create 215-feature vector matching dataset columns
  6. Load trained model and predict
  7. Return classification result (Malware/Safe)

🛠️ Installation

Prerequisites

  • Python 3.8+
  • Java JDK (for apktool)
  • apktool installed at C:\apktool\apktool.bat
  • APKiD installed via pip

Setup

# Clone the repository
git clone https://github.com/AnshGajera/Malware-Analysis.git
cd Malware-Analysis

# Install Python dependencies
pip install flask pandas scikit-learn joblib numpy matplotlib

# Install APKiD
pip install apkid

# Install apktool (Windows)
# Download from https://apktool.org/ and place in C:\apktool\

🚀 Usage

Running the Web Application

cd "Malware Analysis"
python app.py

Then open your browser and navigate to http://127.0.0.1:5000

Analyzing an APK

  1. Open the web interface
  2. Click the upload area
  3. Select an APK file
  4. Wait for analysis to complete
  5. View the prediction result (Malware/Safe)

📊 Dataset

The project uses the Drebin Dataset with 215 features:

Metric Count
Total Samples 15,036
Malware Samples 5,560
Benign Samples 9,476
Features 215

Feature Categories:

  • Android permissions
  • API calls
  • Network activities
  • Hardware features
  • Intent filters

🧠 Model Training

Algorithms Used

  1. SGDClassifier (Stochastic Gradient Descent)

    • Loss: log_loss
    • Max iterations: 1000
    • Early stopping enabled
  2. RandomForestClassifier

    • Estimators: 100
    • Random state: 42

Performance Metrics

  • Target Accuracy: >95%
  • Evaluation includes:
    • Precision
    • Recall
    • F1-Score
    • Classification Report

Training the Model

Open Malware Analysis/NoteBook's/Malware.ipynb and run all cells to:

  1. Load and preprocess data
  2. Train the classifier
  3. Evaluate performance
  4. Save the model

🔧 Technologies Used

Category Technology
Backend Flask (Python)
ML/AI scikit-learn, pandas, numpy
APK Analysis APKiD, apktool
Frontend HTML5, CSS3, JavaScript
Model Serialization joblib

📝 API Endpoints

Endpoint Method Description
/ GET Home page with upload interface
/upload POST Upload APK and get prediction
/favicon.ico GET Favicon handler

👥 Author

Ansh Gajera


📄 License

This project is for educational purposes.


⚠️ Disclaimer

This tool is intended for educational and research purposes only. Always ensure you have proper authorization before analyzing any applications.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors