This repository provides the source code for the paper: Decomposition Dilemmas: Does Claim Decomposition Boost or Burden Fact-Checking Performance?
Fact-checking pipelines increasingly adopt the Decompose-Then-Verify paradigm, where texts are broken down into smaller claims for individual verification and subsequently combined for a veracity decision. While decomposition is widely-adopted in such pipelines, its effects on final fact-checking performance remain underexplored. Some studies have reported improvements from decompostition, while others have observed performance declines, indicating its inconsistent impact. To date, no comprehensive analysis has been conducted to understand this variability. To address this gap, we present an in-depth analysis that explicitly examines the impact of decomposition on downstream verification performance. Through error case inspection and experiments, we introduce a categorization of decomposition errors and reveal a trade-off between accuracy gains and the noise introduced through decomposition. Our analysis provides new insights into understanding current system's instability and offers guidance for future studies toward improving claim decomposition in fact-checking pipelines.
For ClaimDecomp, please refer to the original repository: https://jifan-chen.github.io/ClaimDecomp/
For FELM, please refer to the huggingface page: https://huggingface.co/datasets/hkust-nlp/felm
For WICE and BingChat, please refer to the download script provided by Self-Checker github repository: https://github.com/Miaoranmmm/SelfChecker
For AlignScore, we host the model using NeMo Guardrails. Please refer to the official documentation: https://docs.nvidia.com/nemo/guardrails/user_guides/advanced/align-score-deployment.html
For MiniCheck, please refer to the original repository: https://github.com/Liyan06/MiniCheck
We provide a shell script for running experiments with random combinations of BingChat data. Please refer to run_test.sh for more details.
python3 src/pipeline_nli.py \
--data_dir "./data" \
--input_dir "./input" \
--input_file "bingchat_random_combination_3000.json" \
--output_dir "./output/test" \
--model_name_extraction "gpt-4o" \
--model_name_verification "gpt-4o-mini" \
--decompose_method "specified_number" \
--specified_number_of_claims 8 \
--label_n 2 \
--search_res_num 10 \
--knowledge_base "google"Feel free to cite our paper if you find our insights useful for your research.
@inproceedings{hu-etal-2025-decomposition,
title = "Decomposition Dilemmas: Does Claim Decomposition Boost or Burden Fact-Checking Performance?",
author = "Hu, Qisheng and Long, Quanyu and Wang, Wenya",
booktitle = "Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)",
year = "2025",
address = "Albuquerque, New Mexico",
url = "https://aclanthology.org/2025.naacl-long.320/",
pages = "6313--6336",
}Our implementation is built upon the VeriScore repository and also uses the FactScore repository. We thank the authors for their great work. We recommend you to check out their repositories for more details as their repositories are good resources to start with.
