PhishGNN

Code for the paper: PhishGNN: A Phishing Website Detection Framework using Graph Neural Networks.

Installation

Clone the repo

git clone https://github.com/TristanBilot/phishGNN.git
cd phishGNN

Install dependencies

python3 -m venv venv
. venv/bin/activate
pip install wheel
pip install -r requirements.txt
pip install torch-scatter torch-sparse torch-cluster torch-spline-conv torch-geometric -f https://data.pyg.org/whl/torch-1.12.0+cpu.html # for cpu

unzip the dataset

./install_dataset.sh

Dataset & crawler

The dataset can be downloaded in PyG format and new features can be extracted from URLs using the crawler. A full guide for both tasks can be found here.

Training

During training, the files located in data/training/processed will be used by default. The raw dataset is composed of urls mapped to around 30 features, including a list of references (href, form, iframe) to other pages, which also have their own features and their list of references.

python phishGNN/training.py

Visualize node embeddings

During training, it is possible to generate the embeddings just after passing through the Graph Convolutional layers. Just run the training with the following option:

python phishGNN/training.py --plot-embeddings

Visualize the graphs

A tool has been developed in order to visualize graphically the internal structure of web pages from the dataset along with their characteristics such as the number of nodes/edges and whether the page is phishing or benign.

To visualize these data, first follow the instructions in the installation part, run the visualization script and open the file visualization/visualization.html.

python visualization.py

Citation

If you use this code, please cite the following paper.

@inproceedings{bilot2022phishgnn,
  title={Phishgnn: a phishing website detection framework using graph neural networks},
  author={Bilot, Tristan and Geis, Gr{\'e}goire and Hammi, Badis},
  booktitle={19th International Conference on Security and Cryptography},
  pages={428--435},
  year={2022},
  organization={SCITEPRESS-Science and Technology Publications}
}

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 91 Commits
.devcontainer		.devcontainer
crawler		crawler
data		data
phishGNN		phishGNN
scripts		scripts
tests		tests
visualization		visualization
weights		weights
.gitignore		.gitignore
INSTALLATION.md		INSTALLATION.md
LICENSE		LICENSE
README.md		README.md
install_dataset.sh		install_dataset.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PhishGNN

Installation

Clone the repo

Install dependencies

unzip the dataset

Dataset & crawler

Training

Visualize node embeddings

Visualize the graphs

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

TristanBilot/phishGNN

Folders and files

Latest commit

History

Repository files navigation

PhishGNN

Installation

Clone the repo

Install dependencies

unzip the dataset

Dataset & crawler

Training

Visualize node embeddings

Visualize the graphs

Citation

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages