A modular, multi-stage Python framework for detecting, correcting, and removing hallucinations in Large Language Model (LLM)–generated texts.
It cross-checks each claim in an article against one or more trusted sources, combining question-answer validation, contradiction detection, and traceability scoring.
The pipeline processes an article generated by an LLM and iteratively ensures that all its claims are supported by the reference sources.
It is divided into five main verification stages, each implemented in a dedicated module.
| Step | Name | Purpose |
|---|---|---|
| (0) | Zero-check | Initial factual screening — removes or corrects unsupported sentences. |
| (1) | First check | Concept-level QA validation: verifies consistency of main ideas with the sources. |
| (2) | Second check | Sentence-level QA evaluation and iterative correction. |
| (3) | Third check | Hallucination identification: flags and removes statements unsupported by any source. |
| (4) | Fourth check | Source traceability: verifies and quotes exact evidence for every final sentence. |
Each phase refines the article, producing intermediate versions that are logged and compared through automated change tracking and metrics computation.
hallucination_checker/
├── main.py # Orchestrates the full pipeline
├── config.py # Model settings, paths, I/O folders
├── io_utils.py # File loading/saving utilities
├── utils.py # Helper functions (e.g., delays)
│
├── zero_check.py # Step 0 — factual correction vs. sources
├── first_check.py # Step 1 — concept-level QA validation
├── qa_module.py # Step 2 — sentence-level QA correction
├── hallucination_checker.py # Step 3a — hallucination detection (classic)
├── hallucination_check_alt.py # Step 3b — hallucination detection (optimized)
├── quarto_check.py # Step 4 — source traceability verification
│
├── change_tracker.py # Tracks edits and builds Excel diff chain
├── metrics.py # Computes quantitative evaluation metrics
├── removal_metrics.py # Measures Removal Success Rate (RSR)
├── csv_exporter.py # Exports human-readable CSV report
├── excel_exporter.py # Exports rich Excel report with all steps
└── requirements.txt (optional)
-
Requirements
- Python ≥ 3.10
- Running Ollama server locally (
localhost:11434) - Models:
gemma2:9b,llama3.1:8b(customizable inconfig.py) - Libraries:
pip install nltk openpyxl pandas ollama
- Run once to download NLTK sentence tokenizer:
import nltk nltk.download('punkt')
-
Configuration Edit
config.py:CARTELLA_DOCUMENTI = Path(r"C:\path\to\your\documents") FILE_ARTICOLO = CARTELLA_DOCUMENTI / "article.txt" FILE_SELEZIONATI = ["source1.txt", "source2.json"] MODEL = "gemma2:9b" SECONDARY_MODEL = "llama3.1:8b"
Run the pipeline:
python main.pyInteractive prompts will guide you through:
- Choosing the hallucination check mode (classic or optimized)
- Optionally reinserting missing information (First / Second check)
Outputs are automatically saved in a timestamped folder (e.g. resultati_test_2025-11-03_16-30-00).
| File | Description |
|---|---|
articolo_finale.txt |
Final corrected article |
*_rep_2.xlsx |
Rich Excel report with all steps, Q&A tables, and citations |
T_Art_report_finale.csv |
CSV with traceability information |
tracciamento_modifiche.xlsx |
Step-by-step change tracking |
metrics.csv |
Computed metrics summary |
frasi_rimosse_zero_check.txt |
Log of removed sentences in step 0 |
Automatically computed and exported to CSV:
| Category | Metric | Formula | Trend |
|---|---|---|---|
| Traceability | Sentence Support Rate (SSR) | #supported / #total | ↑ |
| Traceability | Attribution Coverage (AC) | #sentences with quotes / #supported | ↑ |
| Traceability | Strict Support Rate (SSRₛₜᵣᵢcₜ) | #with quotes / #total | ↑ |
| QA Accuracy | QA₁ (Concept) / QA₂ (Sentence) | #correct / #questions | ↑ |
| Hallucination | Unsupported Claim Ratio (UCRR) | #unsupported / #questions | ↓ |
| Hallucination | Removal Success Rate (RSR) | #unsupported removed / #unsupported total | ↑ |
| Preservation | Retention Rate (RR) | #final sentences / #initial sentences | ≈ |
| Stability | Normalized Edit Similarity (NES) | Similarity(initial, final) | ↑ |
- Multi-source factual validation
- Automatic question generation and evaluation
- Interactive correction for missing info
- Adaptive hallucination removal (classic/optimized)
- Fine-grained traceability (per sentence)
- Excel + CSV reports with color-coded summaries
- Comprehensive metrics computation
Developed by Cristian Longoni
Master’s Thesis — “Hallucination Reduction in Large Language Models: A Multi-Step Framework for Detection and Correction”
University of Milano-Bicocca, 2025
This project is released under the MIT License.
You are free to use, modify, and distribute it with proper attribution.