A lightweight, dependency-free git hook and CLI tool that detects hallucinated academic citations in AI-generated text. Prevents fabricated references from entering your codebase or documentation.
- Zero dependencies - Pure Python 3 with regex, no NLP libraries required
- Fast - Regex-based extraction, processes files in milliseconds
- Git hook integration - Auto-check commits, pre-commit, or post-response
- Citation database - Maintains verified citations, flags suspicious ones
- Multiple input modes - stdin, file, or text argument
- JSON/human output - Machine-readable or terminal-friendly reports
AI language models (GPT-4, Claude, etc.) occasionally hallucinate academic citations - generating plausible-sounding references to papers that don't exist. This can:
- Mislead researchers who trust the citations
- Damage credibility when fabricated references are discovered
- Waste time tracking down non-existent papers
- Propagate misinformation
This tool catches these hallucinations before they enter your repository.
# 1. Download the checker script
curl -o citationChecker.py https://raw.githubusercontent.com/lizTheDeveloper/citation-checker/main/citationChecker.py
chmod +x citationChecker.py
# 2. Create citation database directories
mkdir -p research
touch research/BIBLIOGRAPHY.md
touch research/COMMONLY_HALLUCINATED_CITATIONS.md
# 3. Install as git hook (optional)
curl -o .git/hooks/pre-commit https://raw.githubusercontent.com/lizTheDeveloper/citation-checker/main/pre-commit-hook
chmod +x .git/hooks/pre-commit- Copy
citationChecker.pyto your project (e.g.,scripts/ortools/) - Create citation database (see Database Setup below)
- Install git hook (optional, see Git Hook Integration below)
# Check text directly
python citationChecker.py --text "According to Smith et al. (2023)..."
# Check a file
python citationChecker.py --file response.txt
# Pipe from stdin
echo "Jones (2024) found that..." | python citationChecker.py --stdin
# JSON output
python citationChecker.py --file paper.md --json
# Quiet mode (exit code only)
python citationChecker.py --file draft.txt --quiet
echo $? # 0 = all verified, 1 = unverified/suspicious foundChecks staged files before commit:
# Install hook
cp pre-commit-hook .git/hooks/pre-commit
chmod +x .git/hooks/pre-commit
# Now all commits will be checked
git add research/paper_notes.md
git commit -m "Add research notes"
# → Hook runs, blocks commit if hallucinated citations foundChecks AI responses in real-time:
# For Claude Code, Cursor, etc.
cp citation-check.sh .claude/hooks/citation-check.sh
# Configure as post-response hook in your tool
# Warns when AI generates unverified citationsThe checker verifies citations against multiple sources:
Your project's authoritative citation list:
# Bibliography
## Climate Science
- Richardson et al. (2023) - "Earth beyond six of nine planetary boundaries"
- Steffen et al. (2015) - "Planetary boundaries: Guiding human development"
## AI Safety
- Hendrycks et al. (2023) - "Natural Selection Favors AIs over Humans"Known fabricated references to flag:
# Commonly Hallucinated Citations
## Non-Existent Papers
- Johnson & Smith (2025) - This paper doesn't exist, frequently hallucinated
- arXiv:2501.12345 - Fake arXiv ID (404 error)Track fixed citations:
# Citation Corrections Applied - 2024-11-15
## Fixed Citations
- ✅ Richardson et al. (2023) - Verified in Nature journal
- ❌ Thompson (2024) - FABRICATED, removed from codebaseVerified academic papers:
# PDF Review: Richardson et al. (2023)
**Status: ✅ VERIFIED**
**Paper:** Earth beyond six of nine planetary boundaries
**Authors:** Richardson, Steffen, et al.
**Journal:** Science Advances
**Year:** 2023The checker recognizes common academic citation formats:
Author et al. (2024)orAuthor et al. (2024)Author & Author (2024)Author (2024)(Author et al., 2024)(Author, 2024)
⚠️ CITATION VERIFICATION REPORT
============================================================
Citations found: 3
✅ Verified: 2
❓ Unverified: 1
❌ Suspicious: 0
DETAILS:
------------------------------------------------------------
1. Richardson et al. (2023)
Status: ✅ VERIFIED
2. Smith (2024)
Status: ❓ UNVERIFIED
⚠️ Not found in verified database - possible hallucination
3. Jones et al. (2022)
Status: ✅ VERIFIED
============================================================
⚠️ WARNING: Unverified or suspicious citations detected!
These may be hallucinated. Please verify manually.
{
"citations_found": 3,
"verified": 2,
"suspicious": 0,
"unverified": 1,
"all_verified": false,
"results": [
{
"citation": "Richardson et al. (2023)",
"original_text": "Richardson et al. (2023)",
"verified": true,
"suspicious": false,
"status": "✅ VERIFIED",
"author": "Richardson",
"year": "2023"
},
{
"citation": "Smith (2024)",
"original_text": "Smith (2024)",
"verified": false,
"suspicious": false,
"status": "❓ UNVERIFIED",
"author": "Smith",
"year": "2024"
}
]
}0- All citations verified (safe to commit)1- Unverified or suspicious citations found (review needed)
This tool was born from a real incident:
Problem: An AI agent generated 50+ citations for a research simulation project. Upon manual verification, 40% were completely fabricated - plausible-looking references to papers that didn't exist.
Impact:
- 2 weeks wasted tracking down non-existent papers
- Loss of confidence in AI research assistance
- Manual audit of entire codebase required
Solution: This tool now runs on every commit, catching hallucinations immediately.
Case study: See docs/case-studies/research-citation-crisis.md in this repo for full details.
Edit citationChecker.py to customize paths:
class CitationChecker:
def __init__(self, repo_root: Path = None):
self.repo_root = repo_root or Path(__file__).parent
# Customize these paths
self.bibliography = self.repo_root / "docs/citations.md"
self.hallucinated = self.repo_root / "docs/known_hallucinations.md"By default, the git hook warns but doesn't block commits. To enforce strict verification:
Edit .git/hooks/pre-commit:
# Change from:
exit 0 # Warn but allow
# To:
exit 1 # Block commit if unverified# Run test suite
python test_citationChecker.py
# Test with sample text
echo "Richardson et al. (2023) found that Earth has crossed 6 of 9 planetary boundaries" \
| python citationChecker.py --stdin- Extract citations using regex patterns (no NLP required)
- Normalize formats to canonical form (
Author et al. (YYYY)) - Check databases for verified/suspicious citations
- Flag unverified citations for manual review
- Exit with code (0 = safe, 1 = review needed)
- Regex-based - May miss unusual citation formats
- Database maintenance - Requires manual curation of verified citations
- Author-year only - Doesn't verify titles or DOIs (future enhancement)
- English citations - Optimized for English-language papers
- Support for DOI/arXiv ID verification
- Web API integration (CrossRef, arXiv)
- Multi-language citation detection
- Citation correction suggestions
- Browser extension for real-time checking
Pull requests welcome! Areas of interest:
- Additional citation pattern detection
- Integration with citation managers (Zotero, Mendeley)
- Performance optimization for large codebases
- Web service deployment
MIT License - See LICENSE file for details.
If you use this tool in academic work:
@software{citation_hallucination_checker_2025,
title = {Citation Hallucination Checker},
author = {Howard, Liz (Future Infinitive)},
year = {2025},
url = {https://github.com/lizTheDeveloper/citation-checker}
}
- Issues: https://github.com/lizTheDeveloper/citation-checker/issues
- Discussions: https://github.com/lizTheDeveloper/citation-checker/discussions
Created by Liz The Developer (aka Future Infinitive ☸️)
Born from the October 2025 citation crisis in the "Super-Alignment to Utopia" research simulation project.
Remember: AI is powerful, but trust, then verify. Especially citations. 📚✅