This repository contains a complete implementation of TruthTrace for the “full-batch” setting (we experimented with neighbourhood sampling but the results were terrible).
TruthTrace aims to detect misinformation by examining how news propagates rather than relying solely on textual cues. Each news item and its propagation history are treated as a small graph; news nodes and user nodes are connected by retweet, reply or follower edges. A Graph Attention Network (GAT) ingests both content features (text embeddings) and user features and outputs a probability that the news item is disinformation, consistent with the original UPFD benchmark design (Dou et al., 2021).
High-level workflow:
- Input — Load a collection of news items and their associated propagation structures from the UPFD dataset. We have used the "gossipcop" dataset within UPFD since it has a much larger corpus than "politifact".
- Graph construction — Build a heterogeneous graph where nodes represent both news posts and users; edges encode retweets.
- Feature extraction — Use pre-computed text embeddings (e.g. BERT-based features) and simple user-level features (e.g. degree statistics, profile features if available).
- GNN model (GAT) — Apply a Graph Attention Network (Veličković et al., 2018) to aggregate neighbourhood information and produce graph-level outputs (fake vs. real).
- Classifier & thresholding — Train a classifier to output a probability of disinformation and compare it against a strong text-only baseline (Devlin et al., 2019; Vosoughi et al., 2018).
An interactive Flask dashboard visualises each propagation graph and displays predictions from both the baseline and GAT models. Users can select a news item, explore its retweet tree, and inspect the model scores.
This project uses the User Preference-Aware Fake News Detection
(UPFD) dataset (Dou et al., 2021), but does not rely on
torch_geometric.datasets.UPFD to download it. Instead, you download
the data manually from OpenDataLab and place it into the expected
folder structure.
- Go to the UPFD page on OpenDataLab: https://opendatalab.com/OpenDataLab/UPFD/tree/main/raw.
- Log in / create an account if needed.
- Download the archive(s) containing the UPFD data.
Inside your project directory, create the following structure:
truthtrace_full/
data/
politifact/
raw/
... all Politifact UPFD files here ...
gossipcop/
raw/
... all GossipCop UPFD files here ...
git clone <REPO_URL>
cd truthtrace_fullThis will almost definitely require UNIX/Linux/WSL since some of the packages are not Windows-compatible.
pip install -r requirements.txtRun train.py with the data directory passed as an argument via --root.
python train.py --root ./dataOnce the models have trained, run the Flask app for the (local) interface.
python app.pyThen open the URL printed in the terminal (typically http://127.0.0.1:5000/).