Synthetic Malware Generation

Overview

Embedding Generation – A vision transformer is used to create embeddings from binary images of malware samples.
Synthetic Data Generation – A Conditional WGAN-GP is trained to produce high-quality synthetic embeddings.
Classification Experiment – A classifier is trained on the synthetic embeddings and tested on real malware data to assess generalization.

The experiment aims to determine whether a classifier trained on synthetic malware embeddings can effectively classify real malware samples.
The classifier achieved a 94.9% classification accuracy, showing that synthetic data can be used to train classifiers.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
malware_img_embeddings		malware_img_embeddings
.gitignore		.gitignore
README.md		README.md
classifier.py		classifier.py
cwgan_gp.py		cwgan_gp.py
cwgan_gp_tests.ipynb		cwgan_gp_tests.ipynb
embedding_creation.ipynb		embedding_creation.ipynb
mal_embedding_classification.ipynb		mal_embedding_classification.ipynb