Soham De, Michiel Bakker, Jay Baxter, Martin Saveski
This repository contains code for replicating the analysis and experiments in the paper "Supernotes: Driving Consensus in Crowd-Sourced Fact-Checking" published in the proceedings of The ACM Web Conference 2025.
Open-sourced Community Notes data required for generating supernotes can be downloaded from their website. The anonymized data from human experiments needed for replication will be available upon request.
The code is split into two directories:
supernotes: contains our implementation of the supernotes generation framework (Sec 3)analyses: contains code for analyzing and visualizing results of the human experiments (Sec 4)
Before running any of the scripts, please follow these steps to set up the directory:
- Install the Python packages in
requirements.txt - Clone the communitynotes repo at the root directory (last verified with commit #e8d6631)
- Download the latest data from community notes into the
communitynotes/datadirectory - Move the
run_mf.pyscript tocommunitynotes/
git clone <PATH TO THIS REPO>
cd supernotes-public
pip install -r requirements.txt
git clone https://github.com/twitter/communitynotes.git
mkdir -p communitynotes/data # download ratings data in this directory
mv run_mf.py communitynotes/
The supernotes framework is implemented in Python. All analyses and plottings are primarily done in R except for Figure 6 (which involves a scoring process in Python).
To generate a supernote, follow these steps.
- Configure secrets: Generate an OpenAI key and update the value of OPENAI_API_KEY in
supernotes/secrets.json - Download CN Data: Download Community Notes data from their website into the directory:
communitynotes/data - Run example: Run
supernotes/main.pyto generate a supernote as an example
- Summarization:
supernotes/summarizer.pymakes a call via OpenAI API to generate candidate supernotes - Text Embeddings:
supernotes/embedder.pymakes a call via OpenAI API to get text embeddings for posts and notes - Principle Alignment:
supernotes/evaluator.pyperforms link/length checks and makes a call via OpenAI API to implement principle-alignment steps as described in Section 3 - Aggregation:
supernotes/aggregator.pyimplements the Community Notes aggregation step described in Appendix B.1 - Personalized Helpfulness Model:
supernotes/phm.pyimplements the Personalised Helpfulness Model described in Section 3 (and Appendix A)
To replicate the plots (and associated analyses) run the following scripts:
analysis/plot_helpfulness.R: requires access to survey data inanalysis/dataand plots both sub-figures in Figure 4, saves output file (results_helpfulness.eps) ingenerated_plotsanalysis/plot_winrates.R: requires access to survey data inanalysis/dataand plots Figure 5, saves output file (results_winrates.eps) ingenerated_plotsanalysis/plot_cnscores.R: requires access tosurvey_notes_with_scores.csvinanalysis/data(runcommunitynotes/run_mf.pyto generate this file) and plots Figure 6, saves output file (results_cnscores.eps) ingenerated_plots.analysis/plot_tags.R: requires access to survey data inanalysis/dataand plots Figure 7, saves output file (results_tags.eps) ingenerated_plotsanalysis/plot_ablation.R: requires access to ablation data inanalysis/dataand plots Figure 8, saves output file (results_winrates.eps) ingenerated_plotsanalysis/analysis.R: requires access to survey and ablation data inanalysis/dataand runs all statistical tests reported in Section 4.
@inproceedings{de2025supernotes,
title = {Supernotes: Driving Consensus in Crowd-Sourced Fact-Checking},
author = {De, Soham and Baxter, Jay and Bakker, Michiel and Saveski, Martin},
booktitle = {Proceedings of the {ACM} Web Conference 2025 ({WWW} '25)},
year = {2025},
month = {April 28--May 2},
location = {Sydney, NSW, Australia},
publisher = {ACM},
address = {New York, NY, USA},
pages = {11},
doi = {10.1145/3696410.3714934}
}
This code is licensed under the CC BY 4.0 license found in the LICENSE file.