- Embedding Generation – A vision transformer is used to create embeddings from binary images of malware samples.
- Synthetic Data Generation – A Conditional WGAN-GP is trained to produce high-quality synthetic embeddings.
- Classification Experiment – A classifier is trained on the synthetic embeddings and tested on real malware data to assess generalization.
- The experiment aims to determine whether a classifier trained on synthetic malware embeddings can effectively classify real malware samples.
- The classifier achieved a 94.9% classification accuracy, showing that synthetic data can be used to train classifiers.