π₯Β Key Results | π‘Β Paper (arXiv) | πΒ Citation
ND-LoRA implements Neural Diversity Low-Rank Adaptation, a novel training method that combines stream-specific LoRA adapters with Barlow Twins regularization to reduce hallucinations in small language models. Our approach achieves significant improvements in factuality and faithfulness across multiple benchmarks while maintaining model quality.
- 15-25% reduction in hallucination rates on TruthfulQA, HaluEval, and MemoTrap benchmarks
- Parameter-efficient: Only 0.5-2% additional parameters compared to base model
- Causally validated: Neural diversity causally reduces hallucinations (p < 0.001)
- Python 3.9+
- PyTorch 2.0+ with CUDA or MPS support
- 16GB+ RAM (32GB recommended)
# Clone repository
git clone https://github.com/kushalc/nd-lora.git
cd nd-lora
# Install dependencies
pip install -r requirements.txt
# Initialize ParScale submodule
git submodule update --init --recursive# Train ND-LoRA model with P=4 streams
python train_ndlora.py \
--P=4 \
--use-stream-lora \
--orthogonal-lora \
--bt-normalization-warmup \
--target-tokens=20_000_000
# Or use Modal for distributed training
modal run train_ndlora::modal__nslP4__OptC9# Run hallucination benchmarks
cd leaderboard
python backend_cli.py --model YOUR_MODEL_PATH
# Or use evaluation scripts
python eval_experiments.py --checkpoint PATH_TO_CHECKPOINT
python eval_neurodiversity.py --checkpoint PATH_TO_CHECKPOINTPre-trained model checkpoints are available for all configurations reported in the paper:
- Baselines: Qwen2.5-0.5B with P=1 (R=32/64/128)
- ParScale: P=2/4/8 with shared LoRA and Barlow Twins
- ND-LoRA: P=2/4/8 with stream-specific LoRA and optimized regularization
- Ablations: Module ablations, architectural variants
See utils/model_checkpoints_paper.py for checkpoint paths and configurations.
The model_checkpoints_paper.py module provides organized access to all paper-essential model checkpoints:
from utils.model_checkpoints_paper import (
CORE_CHECKPOINTS, # Main results (Tables 1, 7, 8, 9)
ABLATION_CHECKPOINTS, # Ablation studies (Table 4)
MODULE_ABLATION_CHECKPOINTS, # Module ablations (Table 6)
ALL_CHECKPOINTS, # Combined dictionary
MODEL_NAMES, # Human-readable names
BASE_CHECKPOINTS # Base model paths
)
# Access checkpoint paths
checkpoint_path = CORE_CHECKPOINTS["ND-LoRA_P4"] # S3 path for ND-LoRA P=4 model
model_name = MODEL_NAMES["ND-LoRA_P4"] # "ND-LoRA (P=4, OptC9)"
# Use with evaluation scripts
python analyze_experiments.py --model-whitelist nd-lora/
python eval_experiments.py --checkpoint CHECKPOINT_PATHThe analyze_experiments.py script can read evaluation results from evals-* directories and generate publication-ready plots:
# Generate analysis plots from evaluation results
python analyze_experiments.py \
--results-base-path leaderboard \
--output-dir plots \
--plot-mode all pub \
--analysis-mode full \
--baseline-mode single-stream
# View generated plots
open plots/pub-full-single-stream-relative.pngThe script automatically:
- Reads from
leaderboard/evals-{analysis_mode}/directories - Maps raw S3 checkpoint paths to human-readable model names using
MODEL_NAMES - Generates absolute and relative performance heatmaps
- Creates model-level and evaluation-level summary statistics
Note: Checkpoints will be migrated to public hosting soon. Check back for updated URLs.
All experiments in the paper can be reproduced using Modal for distributed execution:
# P=1 baselines (parameter-matched)
modal run train_ndlora::modal__P1__r32
modal run train_ndlora::modal__P1__r64
modal run train_ndlora::modal__P1__r128
# ParScale baselines
modal run train_ndlora::modal__P2__r32
modal run train_ndlora::modal__P4__r64
modal run train_ndlora::modal__P8__r128
# ND-LoRA main results (Optuna-optimized)
modal run train_ndlora::modal__nslP2__OptC9
modal run train_ndlora::modal__nslP4__OptC9
modal run train_ndlora::modal__nslP8__OptC9# Component ablations
modal run train_ndlora::modal__lP4__r64 # ParScale-BT
modal run train_ndlora::modal__sP4 # Stream-LoRA
modal run train_ndlora::modal__slP4 # Stream-LoRA-BT
modal run train_ndlora::modal__nslP4 # ND-LoRA (original HP)
# Module ablations
modal run train_ndlora::modal__p4_nOSL_ablation__modules# Deep evaluation (N=1024 samples per task)
cd leaderboard
python eval-cli.py --checkpoint CHECKPOINT_PATH --mode deep
# Corruption experiments for causality analysis
python eval_neurodiversity.py \
--checkpoint CHECKPOINT_PATH \
--corruption-methods substitute_tokens substitute_streams \
--n-samples 128- Parallel Streams (P): Multiple computation paths through the model
- Stream-Specific LoRA: Independent low-rank adapters for each stream
- Barlow Twins Regularization: Decorrelation loss to maintain neural diversity
- Optimized Hyperparameters: Ξ»_BT tuned via Optuna for each P value
| Parameter | P=2 | P=4 | P=8 |
|---|---|---|---|
| LoRA Rank | 16 | 16 | 16 |
| Ξ»_BT | 0.29 | 0.58 | 0.13 |
| Design Layer | 20 | 20 | 20 |
| LoRA Modules | q,k,v | q,k,v | q,k,v |
nd-lora/
βββ train_ndlora.py # Main training script with Modal entrypoints
βββ eval_experiments.py # Hallucination benchmark evaluation
βββ eval_neurodiversity.py # Causality experiments (corruption analysis)
βββ ParScale/ # Core ParScale implementation (submodule)
βββ utils/
β βββ model_checkpoints_paper.py # Paper-essential model checkpoints
β βββ model_utils.py # Model loading and PEFT setup
β βββ stream_diagnostics.py # Stream analysis and monitoring
β βββ ... # Other utilities
βββ leaderboard/ # Hallucination evaluation framework
β βββ backend_cli.py # Evaluation worker
β βββ app.py # Gradio web interface
β βββ src/backend/tasks/ # Custom evaluation tasks
βββ paper/ # LaTeX source for paper
βββ docs/ # Implementation documentation
This project uses Modal for running experiments and evaluations. Modal entrypoints in train_ndlora.py allow distributed training across cloud GPUs.
# Install Modal CLI
pip install modal
# Authenticate
modal token new
# Run experiment
modal run train_ndlora::modal__nslP4__OptC9If you use this code or find our work helpful, please cite:
@article{chakrabarti2025neurodiversity,
title={Neural Diversity Regularizes Hallucinations in Small Language Models},
author={Chakrabarti, Kushal and Balachundhar, Nirmal},
journal={arXiv preprint arXiv:2510.20690},
year={2025}
}This project is licensed under the MIT License - see the LICENSE file for details.
- Base model: Qwen2.5-0.5B
- Training data: The Pile
- ParScale architecture adapted from: cli99/ParScale
- Evaluation framework: lm-evaluation-harness
For questions or issues, please open a GitHub issue or contact the authors.