End-to-end text-to-SQL system that translates natural language questions into executable SQL. Trained on Spider, evaluated on a custom NBA statistics database. Systematic comparison of full fine-tuning vs. LoRA vs. QLoRA, with a RAG pipeline for schema retrieval.
Course: COSI 115b Fundamentals of NLP II | Deadline: May 6, 2026
- Build a working text-to-SQL pipeline (data → train → eval → demo)
- Systematically compare training regimes: prompt engineering, full fine-tuning, LoRA, QLoRA
- Study cross-domain generalization from Spider (general) → NBA (domain)
- Quantify RAG's contribution to schema linking accuracy
- Ship a Gradio demo answering natural language NBA questions
PyTorch · HuggingFace Transformers · PEFT (LoRA/QLoRA) · Datasets · Accelerate · sentence-transformers · FAISS · Gradio · SQLite · sqlparse
- Spider (training):
xlangai/spideron HuggingFace. 10,181 questions, 200 databases, 138 domains. - NBA (domain evaluation): SQLite DB built from nba-sql. ~10 tables (players, teams, games, player_game_log, etc.). Self-authored ~80–100 question/SQL pairs stratified by difficulty.
t5-base(primary)Salesforce/codet5-base(comparison — code-pretrained)
nba-text2sql/
├── data/
│ ├── raw/ # Spider download, NBA sqlite
│ ├── processed/ # Unified train/dev/test JSON
│ └── nba/ # Self-authored NBA question/SQL pairs
├── src/
│ ├── data_utils.py # Loading, schema serialization, input formatting
│ ├── rag.py # sentence-transformer + FAISS schema retrieval
│ ├── train.py # Full fine-tune + LoRA/QLoRA training loop
│ ├── evaluate.py # Execution accuracy, exact match, error taxonomy
│ ├── prompt_baseline.py # Zero-shot & few-shot baselines
│ └── demo.py # Gradio interface
├── notebooks/ # Exploratory analysis, result plots
├── models/ # Saved checkpoints (gitignored)
├── eval/ # Result JSONs, confusion matrices, plots
├── requirements.txt
└── README.md
| Condition | Spider dev | NBA eval |
|---|---|---|
| Zero-shot T5 | ✓ | ✓ |
| Few-shot T5 | ✓ | ✓ |
| Full fine-tune T5 | ✓ | ✓ |
| LoRA T5 (rank 4/8/16/32) | ✓ | ✓ |
| QLoRA T5 | ✓ | ✓ |
| LoRA CodeT5 | ✓ | ✓ |
| Best model + RAG | ✓ | ✓ |
| Best model + RAG + few-shot NBA | — | ✓ |
Cross-domain curve: Best Spider-trained model, evaluated on NBA with 0 / 10 / 50 / all NBA examples for few-shot adaptation.
| Week | Dates | Milestones |
|---|---|---|
| 1 | Apr 1–7 | Repo setup · Spider + NBA data loaded · NBA Q/SQL pairs authored · zero/few-shot baselines working |
| 2 | Apr 8–14 | Full fine-tune T5-base on Spider · RAG pipeline built · RAG on/off comparison |
| 3 | Apr 15–21 | LoRA/QLoRA ablations (rank, LR, modules) · T5 vs CodeT5 · few-shot NBA adaptation curve |
| 4 | Apr 22–28 | NBA evaluation + error taxonomy · Gradio demo · start write-up with all tables/figures |
| 5 | Apr 29–May 6 | Polish write-up · clean repo · slides · present May 6 |
git clone <repo>
cd nba-text2sql
python -m venv venv && source venv/bin/activate
pip install -r requirements.txt
# Download Spider via HuggingFace (automatic on first use)
python src/data_utils.py --download spider
# Build NBA SQLite database (takes ~30 min for current season, hours for full history)
git clone https://github.com/mpope9/nba-sql.git
cd nba-sql && python stats/nba_sql.py --database sqlite --sqlite-path ../data/raw/nba.sqlite# Baseline: zero-shot and few-shot
python src/prompt_baseline.py --model t5-base --mode zero_shot --eval spider
python src/prompt_baseline.py --model t5-base --mode few_shot --eval nba
# Full fine-tune
python src/train.py --model t5-base --method full --dataset spider --epochs 3
# LoRA with rank sweep
for rank in 4 8 16 32; do
python src/train.py --model t5-base --method lora --rank $rank --dataset spider
done
# Evaluate best checkpoint with RAG
python src/evaluate.py --checkpoint models/lora_r16_best --use_rag --eval_set nba
# Launch demo
python src/demo.py --checkpoint models/lora_r16_best- Execution accuracy — generated SQL returns the same result as gold SQL when executed
- Exact match — SQL string matches gold after canonicalization
- Stratified accuracy — split by Spider difficulty (easy/medium/hard/extra)
- Error taxonomy — schema linking errors, structural errors, value prediction errors, aggregation errors
- Proposal (Apr 1)
- GitHub repository with reproducible scripts
- Write-up (≥4 pages PDF)
- Gradio demo
- Presentation slides (8–10 min)
- Yu et al. 2018 — Spider: A Large-Scale Human-Labeled Dataset for Text-to-SQL
- Hu et al. 2021 — LoRA: Low-Rank Adaptation of Large Language Models
- Dettmers et al. 2023 — QLoRA: Efficient Finetuning of Quantized LLMs
- Lewis et al. 2020 — Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
- Raffel et al. 2020 — T5: Exploring the Limits of Transfer Learning
- Wang et al. 2021 — CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder