- Project Overview
- Problem Statement
- Why Hallucinations Are Dangerous in Billing Systems
- Project Goal
- Model & Training Strategy
- Dataset Design
- Policy-Driven Training Approach
- Training Process (Step-by-Step)
- Evaluation Methodology
- Results Summary
- Challenges Faced
- Key Learnings
- Limitations & Future Improvements
- Training Configuration Notes
- Conclusion
This project focuses on reducing hallucinations in a Large Language Model (LLM) used for SaaS billing customer support.
The assistant is designed for a fictional company called PayFlow and must answer only using official billing policies.
The model is fine-tuned using LoRA (Low-Rank Adaptation) on top of a base instruction-tuned LLM.
Large Language Models often produce confident but incorrect answers, known as hallucinations.
In billing and payments systems, hallucinations can cause:
- Financial loss
- Legal issues
- Loss of customer trust
This project addresses the question:
How can we constrain an LLM to answer strictly from approved billing policies and safely refuse when information is unavailable?
Examples of real-world risks:
- Inventing refund policies
- Claiming unsupported payment methods
- Fabricating discounts or currencies
In enterprise systems, safe refusal is better than a wrong answer.
The goal of this project is to:
- Reduce hallucinations in billing-related questions
- Ensure strict policy adherence
- Teach the model when to refuse instead of guessing
- Quantify hallucination reduction using before/after evaluation
- Instruction-tuned large language model (Mistral-style architecture)
- LoRA (Low-Rank Adaptation)
- Only a small percentage of parameters are trained
- Base model weights remain unchanged
- Memory efficient
- Faster training
- Prevents catastrophic forgetting
- Industry-standard for alignment tasks
The model is not trained directly on policy documents.
Instead:
billing.mdacts as the human source of truth- Policies are manually converted into instruction–response pairs
- Only information present in the policy is allowed
- One concept → one behavior
- Explicit refusals for unknown information
- No assumptions or industry defaults
The model is trained to follow this rule:
Answer only what is explicitly stated in PayFlow’s billing policy.
If information is missing, respond with a standard refusal.
Standard refusal phrase:
This information is not available in PayFlow’s billing policy.
This consistency is critical for hallucination control.
- Environment setup in Google Colab
- Load base model in 4-bit precision
- Attach LoRA adapters
- Load curated dataset (
train.json) - Fine-tune LoRA adapters for 2–3 epochs
- Save LoRA adapter weights only
No full model retraining is performed.
Evaluation is performed using:
- Known policy questions
- Edge cases
- Trap questions (questions not covered in policy)
Each response is classified as:
- ✅ Correct
⚠️ Safe but imperfect (over-refusal / verbosity)- ❌ Hallucination (fabricated information)
- Hallucination rate before training: ~60–70%
- Hallucination rate after training: ~8–10%
- Approximate reduction: 75–90%
Detailed before/after comparisons are documented separately in RESULTS.md.
The model initially hallucinated more confidently after partial fine-tuning.
Excessive refusal occurred when refusal samples outweighed valid answers.
Similar questions with different expected behaviors caused instability.
Each issue was resolved through dataset normalization and retraining.
- Hallucination reduction is a data problem, not a model problem
- Models hallucinate where policies are silent
- Over-refusal is safer than hallucination but must be balanced
- Alignment is iterative, not one-shot
- Some verbosity and response blending remains
- Further reduction (<5%) would require larger datasets
- Automated policy-to-dataset generation could improve scalability
LoRA adapters were successfully trained over 3 epochs, with training loss decreasing steadily, indicating effective policy alignment.
Training completed in 3 epochs with final training loss ≈ 1.14.
This project demonstrates a practical, enterprise-grade approach to hallucination control using LoRA.
Rather than chasing perfect answers, the model is trained to:
- Respect policy boundaries
- Avoid guessing
- Fail safely
This approach mirrors how real-world AI systems are deployed in billing and finance domains.