DPC: Training-Free Text-to-SQL Candidate Selection via Dual-Paradigm Consistency
Official research code for our ACL 2026 Main Track paper
Boyan Li Β· Ou Ocean Kun Hei Β· Yue Yu Β· Yuyu Luo
Paper β’ Overview β’ Installation β’ Quick Start β’ Experiments β’ Citation
| Task Training-free Text-to-SQL candidate selection |
Core Idea Verify SQL with a Python/Pandas solution on a minimal distinguishing database |
Outcome Reduce selection errors caused by shared LLM blind spots and self-consistency bias |
2026-04-17: Our paper is available on arXiv: 2604.15163.2026: DPC is accepted to ACL 2026 Main Track.
DPC is a training-free method for inference-time Text-to-SQL candidate selection. Instead of asking an LLM judge to directly decide which SQL is correct, DPC introduces a second reasoning channel in Python/Pandas, constructs a Minimal Distinguishing Database (MDD), and chooses the SQL candidate whose execution is more consistent with the Python solution.
DPC reframes SQL selection from "guess which candidate is right on hidden data" into "verify which candidate survives on visible distinguishing data."
This repository accompanies the paper DPC: Training-Free Text-to-SQL Candidate Selection via Dual-Paradigm Consistency, which reports consistent gains on BIRD and Spider, including up to +2.2 absolute accuracy over strong training-free baselines.
| Problem | LLM-as-a-Judge for SQL often shares the same reasoning failures as SQL generation itself. |
| Key Idea | Cross-check candidate SQLs with a second program form that has different failure modes: Python/Pandas. |
| Verification Target | Construct a Minimal Distinguishing Database where competing SQLs diverge, then compare them against a Python answer. |
| Scope | Inference-time selection only. No task-specific finetuning or verifier training is required. |
-
Candidate Selection Group candidate SQLs and identify a champion and challenger. Supported grouping modes:
execution: cluster by execution result on the original databasellm_prompt: group by prompt-based logical equivalence
-
Schema Slicing Keep only the tables and columns needed for the duel, then validate the slice with dry-run checks.
-
Minimal Distinguishing Database Construction Generate a compact synthetic database slice where the two SQLs produce different answers.
-
Cross-Paradigm Verification Ask a Python solver to answer the same NL question over the synthetic data and compare both SQL outputs against the Python result.
-
Final Decision Select the SQL candidate that is more consistent with the Python proxy answer.
Why this is different from standard self-consistency
Standard self-consistency still votes within the same program form: SQL. DPC explicitly introduces a second program modality with different inductive biases and failure modes, which helps avoid consensus over the same logical mistake.
- Main DPC pipeline
SlicerAgent,TesterAgent,PythonSolverAgent,EquivalenceGrouperAgent
- Training-free baselines
- SC, MCS, USC, EX-guided, Random
- Evaluation utilities
- execution accuracy
- Pass@K / candidate upper bound
- majority-analysis and solver-reliability analysis
- Research-oriented runners
- resumable batch processing
- local artifact management under
artifacts/ - checked-in result snapshots under
results/
This project uses uv for dependency management.
uv syncThis creates a local .venv/.
All shell wrappers in scripts/ automatically prefer .venv/bin/python if it exists, so after uv sync you can usually run them directly with bash.
If you want to run Python entry points directly, use:
uv run python baseline/run_dpc_selection.py --helpThe main runtime dependencies are:
openaipandasnumpyscipytabulatetqdmchardet
The repository uses an OpenAI-compatible chat API wrapper. You can provide credentials in either of these ways:
- shell wrappers:
API_KEY,BASE_URL - direct Python entry points:
--api_key,--base_url - fallback env vars used by the LLM wrapper:
OPENAI_API_KEY,OPENAI_BASE_URL
Most experiments are driven by shell wrappers under scripts/.
These wrappers expose configuration through environment variables and write local outputs into artifacts/ by default.
| Variable | Meaning |
|---|---|
MODEL_NAME |
LLM model name used by generation / selection / verification |
API_KEY |
API key for the OpenAI-compatible backend |
BASE_URL |
Base URL for the backend |
ARTIFACT_ROOT |
Root directory for generated local outputs, default artifacts |
PYTHON_BIN |
Explicit Python interpreter override for shell wrappers |
PHASE1_SELECTION_MODE |
execution or llm_prompt in DPC |
NUM_GROUPING_ATTEMPTS |
Self-consistency samples for prompt-based SQL grouping |
EVAL_METRIC |
bs_f1 or ex for DPC verification |
artifacts/: local experiment outputs generated by your runsresults/: curated or checked-in result snapshots already included in the repository
The default workflow is:
- Generate candidate SQLs
- Run a selector
- Evaluate predictions
MODEL_NAME=gpt-5-chat-latest \
API_KEY=your_api_key \
BASE_URL=your_base_url \
bash scripts/run_gen_baseline.shThis writes candidates to:
artifacts/candidates/...
Execution-based Phase 1:
MODEL_NAME=gpt-5-chat-latest \
API_KEY=your_api_key \
BASE_URL=your_base_url \
PHASE1_SELECTION_MODE=execution \
EVAL_METRIC=bs_f1 \
bash scripts/run_dpc_selection.shPrompt-based Phase 1:
MODEL_NAME=gpt-5-chat-latest \
API_KEY=your_api_key \
BASE_URL=your_base_url \
PHASE1_SELECTION_MODE=llm_prompt \
NUM_GROUPING_ATTEMPTS=3 \
bash scripts/run_dpc_selection.shExecution accuracy:
bash scripts/run_eval_ex.shPass@K over candidate groups:
PASS_K=2 bash scripts/run_pass_n_eval.sh| Goal | Command |
|---|---|
| Generate generic candidate SQLs | bash scripts/run_gen_baseline.sh |
| Generate OmniSQL candidates | bash scripts/run_gen_omnisql.sh |
| Generate XiYan candidates | bash scripts/run_gen_xiyan.sh |
| Run DPC | bash scripts/run_dpc_selection.sh |
| Run SC | bash scripts/run_sc_selection.sh |
| Run MCS | bash scripts/run_mcs_selection.sh |
| Run EX-guided baseline | bash scripts/run_ex_guided_selection.sh |
| Run Random baseline | bash scripts/run_random_selection.sh |
| Run EX evaluation | bash scripts/run_eval_ex.sh |
| Run Pass@K evaluation | bash scripts/run_pass_n_eval.sh |
| Compare DPC vs SC over multiple candidate counts | bash scripts/run_candidates_analysis.sh |
These are useful for deeper analysis and paper ablations:
| Utility | Entry Point |
|---|---|
| USC selection | uv run python baseline/run_usc_selection.py ... |
| MCS without execution clustering | uv run python baseline/run_mcs_selection_wo_execution.py ... |
| MDD generation only | uv run python baseline/run_mdd_generation.py ... |
| Post-hoc MDD stats | uv run python baseline/run_mdd_posthoc_stats.py ... |
| Solver reliability analysis | uv run python baseline/run_solver_reliability_experiment.py ... |
| Majority-analysis between SC and DPC | bash scripts/run_majority_analysis.sh |
If you want full control beyond the shell wrappers:
uv run python baseline/run_dpc_selection.py \
--dataset_type bird \
--data_path data/bird/dev/dev.json \
--db_root_path data/bird/dev/dev_databases \
--pred_sqls_path artifacts/candidates/GPT-4o.json \
--output_path artifacts/selected/DPC_Results.json \
--model_name gpt-4o \
--phase1_selection_mode execution \
--eval_metric bs_f1 \
--num_workers 8The repository expects candidate SQLs in JSON format:
{
"question_id_0": [
"SELECT count(*) FROM head WHERE age > 56",
"SELECT count(*) FROM head WHERE age >= 56"
],
"question_id_1": [
"SELECT ..."
]
}Selectors typically write:
{
"question_id_0": "SELECT ..."
}- DPC and some analysis scripts save incrementally and support resume.
artifacts/is intended for your local runs.results/contains repository snapshots and should not be treated as the default write target.
The current codebase includes dataset loaders for:
- BIRD
- Spider
Relevant modules:
DPC-SQL/
βββ artifacts/ # Local experiment outputs (gitignored)
βββ baseline/ # Main runners for generation, baselines, and analysis
βββ data/ # Dataset files and database directories
βββ dpc/
β βββ agents/ # Slicer / Tester / Solver / Selector agents
β βββ core/ # DPC pipeline orchestration
β βββ datasets/ # Spider / BIRD dataset loaders
β βββ eval/ # BS-F1 / EX evaluation logic
β βββ llm/ # OpenAI-compatible LLM wrapper
β βββ prompts/ # Prompt templates
β βββ utils/ # DB execution, schema extraction, parsing, clustering
βββ evaluation/ # Evaluation scripts
βββ results/ # Checked-in snapshots / curated outputs
βββ scripts/ # Experiment shell wrappers
βββ pyproject.toml # uv dependency definition
βββ uv.lock # Reproducible dependency lockfile
- This repository is organized as a research codebase, not as a packaged library.
- The preferred user-facing workflow is:
uv syncbash scripts/...
- Prompt-based paths depend on an available OpenAI-compatible LLM backend and may incur API cost.
If you find this repository useful, please cite our paper. Until the ACL Anthology entry is available, we recommend using the arXiv-form BibTeX below.
@misc{li2026dpc,
title = {DPC: Training-Free Text-to-SQL Candidate Selection via Dual-Paradigm Consistency},
author = {Boyan Li and Ou Ocean Kun Hei and Yue Yu and Yuyu Luo},
year = {2026},
eprint = {2604.15163},
archivePrefix = {arXiv},
primaryClass = {cs.DB},
doi = {10.48550/arXiv.2604.15163},
url = {https://arxiv.org/abs/2604.15163},
note = {Accepted to ACL 2026 Main Track}
}Paper link: https://arxiv.org/abs/2604.15163
This project is released under the MIT License. See LICENSE for details.