Skip to content

TristanBilot/phishGNN

Repository files navigation

PhishGNN

Code for the paper: PhishGNN: A Phishing Website Detection Framework using Graph Neural Networks.

phishing_graph phishing_graph

Installation

Clone the repo

git clone https://github.com/TristanBilot/phishGNN.git
cd phishGNN

Install dependencies

python3 -m venv venv
. venv/bin/activate
pip install wheel
pip install -r requirements.txt
pip install torch-scatter torch-sparse torch-cluster torch-spline-conv torch-geometric -f https://data.pyg.org/whl/torch-1.12.0+cpu.html # for cpu

unzip the dataset

./install_dataset.sh

Dataset & crawler

The dataset can be downloaded in PyG format and new features can be extracted from URLs using the crawler. A full guide for both tasks can be found here.

Training

During training, the files located in data/training/processed will be used by default. The raw dataset is composed of urls mapped to around 30 features, including a list of references (href, form, iframe) to other pages, which also have their own features and their list of references.

python phishGNN/training.py

Visualize node embeddings

During training, it is possible to generate the embeddings just after passing through the Graph Convolutional layers. Just run the training with the following option:

python phishGNN/training.py --plot-embeddings

Visualize the graphs

A tool has been developed in order to visualize graphically the internal structure of web pages from the dataset along with their characteristics such as the number of nodes/edges and whether the page is phishing or benign.

To visualize these data, first follow the instructions in the installation part, run the visualization script and open the file visualization/visualization.html.

python visualization.py

Screenshot 2022-03-30 at 12 39 01

Citation

If you use this code, please cite the following paper.

@inproceedings{bilot2022phishgnn,
  title={Phishgnn: a phishing website detection framework using graph neural networks},
  author={Bilot, Tristan and Geis, Gr{\'e}goire and Hammi, Badis},
  booktitle={19th International Conference on Security and Cryptography},
  pages={428--435},
  year={2022},
  organization={SCITEPRESS-Science and Technology Publications}
}

License

MIT

About

Phishing detection using GNNs (SECRYPT'22)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •