🧠 Hallucination Checker — Multi-Step Framework for LLM Factual Verification

A modular, multi-stage Python framework for detecting, correcting, and removing hallucinations in Large Language Model (LLM)–generated texts.
It cross-checks each claim in an article against one or more trusted sources, combining question-answer validation, contradiction detection, and traceability scoring.

🚀 Overview

The pipeline processes an article generated by an LLM and iteratively ensures that all its claims are supported by the reference sources.
It is divided into five main verification stages, each implemented in a dedicated module.

Step	Name	Purpose
(0)	Zero-check	Initial factual screening — removes or corrects unsupported sentences.
(1)	First check	Concept-level QA validation: verifies consistency of main ideas with the sources.
(2)	Second check	Sentence-level QA evaluation and iterative correction.
(3)	Third check	Hallucination identification: flags and removes statements unsupported by any source.
(4)	Fourth check	Source traceability: verifies and quotes exact evidence for every final sentence.

Each phase refines the article, producing intermediate versions that are logged and compared through automated change tracking and metrics computation.

🧩 Project Structure

hallucination_checker/
├── main.py                    # Orchestrates the full pipeline
├── config.py                  # Model settings, paths, I/O folders
├── io_utils.py                # File loading/saving utilities
├── utils.py                   # Helper functions (e.g., delays)
│
├── zero_check.py              # Step 0 — factual correction vs. sources
├── first_check.py             # Step 1 — concept-level QA validation
├── qa_module.py               # Step 2 — sentence-level QA correction
├── hallucination_checker.py   # Step 3a — hallucination detection (classic)
├── hallucination_check_alt.py # Step 3b — hallucination detection (optimized)
├── quarto_check.py            # Step 4 — source traceability verification
│
├── change_tracker.py          # Tracks edits and builds Excel diff chain
├── metrics.py                 # Computes quantitative evaluation metrics
├── removal_metrics.py         # Measures Removal Success Rate (RSR)
├── csv_exporter.py            # Exports human-readable CSV report
├── excel_exporter.py          # Exports rich Excel report with all steps
└── requirements.txt (optional)

⚙️ Installation

Requirements
- Python ≥ 3.10
- Running Ollama server locally (localhost:11434)
- Models: gemma2:9b, llama3.1:8b (customizable in config.py)
- Libraries:
```
pip install nltk openpyxl pandas ollama
```
- Run once to download NLTK sentence tokenizer:
```
import nltk
nltk.download('punkt')
```

Configuration Edit config.py:

CARTELLA_DOCUMENTI = Path(r"C:\path\to\your\documents")
FILE_ARTICOLO = CARTELLA_DOCUMENTI / "article.txt"
FILE_SELEZIONATI = ["source1.txt", "source2.json"]
MODEL = "gemma2:9b"
SECONDARY_MODEL = "llama3.1:8b"

▶️ Usage

Run the pipeline:

python main.py

Interactive prompts will guide you through:

Choosing the hallucination check mode (classic or optimized)
Optionally reinserting missing information (First / Second check)

Outputs are automatically saved in a timestamped folder (e.g. resultati_test_2025-11-03_16-30-00).

📂 Output Files

File	Description
`articolo_finale.txt`	Final corrected article
`*_rep_2.xlsx`	Rich Excel report with all steps, Q&A tables, and citations
`T_Art_report_finale.csv`	CSV with traceability information
`tracciamento_modifiche.xlsx`	Step-by-step change tracking
`metrics.csv`	Computed metrics summary
`frasi_rimosse_zero_check.txt`	Log of removed sentences in step 0

📊 Metrics

Automatically computed and exported to CSV:

Category	Metric	Formula	Trend
Traceability	Sentence Support Rate (SSR)	#supported / #total	↑
Traceability	Attribution Coverage (AC)	#sentences with quotes / #supported	↑
Traceability	Strict Support Rate (SSRₛₜᵣᵢcₜ)	#with quotes / #total	↑
QA Accuracy	QA₁ (Concept) / QA₂ (Sentence)	#correct / #questions	↑
Hallucination	Unsupported Claim Ratio (UCRR)	#unsupported / #questions	↓
Hallucination	Removal Success Rate (RSR)	#unsupported removed / #unsupported total	↑
Preservation	Retention Rate (RR)	#final sentences / #initial sentences	≈
Stability	Normalized Edit Similarity (NES)	Similarity(initial, final)	↑

🧮 Core Features

Multi-source factual validation
Automatic question generation and evaluation
Interactive correction for missing info
Adaptive hallucination removal (classic/optimized)
Fine-grained traceability (per sentence)
Excel + CSV reports with color-coded summaries
Comprehensive metrics computation

🧑‍💻 Authors

Developed by Cristian Longoni
Master’s Thesis — “Hallucination Reduction in Large Language Models: A Multi-Step Framework for Detection and Correction”
University of Milano-Bicocca, 2025

🪪 License

This project is released under the MIT License.
You are free to use, modify, and distribute it with proper attribution.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Framework		Framework
Thesis		Thesis
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🧠 Hallucination Checker — Multi-Step Framework for LLM Factual Verification

🚀 Overview

🧩 Project Structure

⚙️ Installation

▶️ Usage

📂 Output Files

📊 Metrics

🧮 Core Features

🧑‍💻 Authors

🪪 License

About

Uh oh!

Releases

Packages

Languages

License

LongoCris/Thesis_Hallucinations

Folders and files

Latest commit

History

Repository files navigation

🧠 Hallucination Checker — Multi-Step Framework for LLM Factual Verification

🚀 Overview

🧩 Project Structure

⚙️ Installation

▶️ Usage

📂 Output Files

📊 Metrics

🧮 Core Features

🧑‍💻 Authors

🪪 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages