DPC

DPC: Training-Free Text-to-SQL Candidate Selection via Dual-Paradigm Consistency

Official research code for our ACL 2026 Main Track paper

Boyan Li · Ou Ocean Kun Hei · Yue Yu · Yuyu Luo

Paper • Overview • Installation • Quick Start • Experiments • Citation

Task
Training-free Text-to-SQL candidate selection

Core Idea
Verify SQL with a Python/Pandas solution on a minimal distinguishing database

Outcome
Reduce selection errors caused by shared LLM blind spots and self-consistency bias

🔔 News

2026-04-17: Our paper is available on arXiv: 2604.15163.
2026: DPC is accepted to ACL 2026 Main Track.

✨ Overview

DPC is a training-free method for inference-time Text-to-SQL candidate selection. Instead of asking an LLM judge to directly decide which SQL is correct, DPC introduces a second reasoning channel in Python/Pandas, constructs a Minimal Distinguishing Database (MDD), and chooses the SQL candidate whose execution is more consistent with the Python solution.

DPC reframes SQL selection from "guess which candidate is right on hidden data" into "verify which candidate survives on visible distinguishing data."

This repository accompanies the paper DPC: Training-Free Text-to-SQL Candidate Selection via Dual-Paradigm Consistency, which reports consistent gains on BIRD and Spider, including up to +2.2 absolute accuracy over strong training-free baselines.

Problem	LLM-as-a-Judge for SQL often shares the same reasoning failures as SQL generation itself.
Key Idea	Cross-check candidate SQLs with a second program form that has different failure modes: Python/Pandas.
Verification Target	Construct a Minimal Distinguishing Database where competing SQLs diverge, then compare them against a Python answer.
Scope	Inference-time selection only. No task-specific finetuning or verifier training is required.

🧠 Method At A Glance

Candidate Selection Group candidate SQLs and identify a champion and challenger. Supported grouping modes:
- execution: cluster by execution result on the original database
- llm_prompt: group by prompt-based logical equivalence
Schema Slicing Keep only the tables and columns needed for the duel, then validate the slice with dry-run checks.
Minimal Distinguishing Database Construction Generate a compact synthetic database slice where the two SQLs produce different answers.
Cross-Paradigm Verification Ask a Python solver to answer the same NL question over the synthetic data and compare both SQL outputs against the Python result.
Final Decision Select the SQL candidate that is more consistent with the Python proxy answer.

Why this is different from standard self-consistency

Standard self-consistency still votes within the same program form: SQL. DPC explicitly introduces a second program modality with different inductive biases and failure modes, which helps avoid consensus over the same logical mistake.

🚀 What’s Included

Main DPC pipeline
- SlicerAgent, TesterAgent, PythonSolverAgent, EquivalenceGrouperAgent
Training-free baselines
- SC, MCS, USC, EX-guided, Random
Evaluation utilities
- execution accuracy
- Pass@K / candidate upper bound
- majority-analysis and solver-reliability analysis
Research-oriented runners
- resumable batch processing
- local artifact management under artifacts/
- checked-in result snapshots under results/

🛠 Installation

This project uses uv for dependency management.

uv sync

This creates a local .venv/. All shell wrappers in scripts/ automatically prefer .venv/bin/python if it exists, so after uv sync you can usually run them directly with bash.

If you want to run Python entry points directly, use:

uv run python baseline/run_dpc_selection.py --help

Dependencies

The main runtime dependencies are:

openai
pandas
numpy
scipy
tabulate
tqdm
chardet

LLM Backend

The repository uses an OpenAI-compatible chat API wrapper. You can provide credentials in either of these ways:

shell wrappers: API_KEY, BASE_URL
direct Python entry points: --api_key, --base_url
fallback env vars used by the LLM wrapper: OPENAI_API_KEY, OPENAI_BASE_URL

⚙️ Configuration

Most experiments are driven by shell wrappers under scripts/. These wrappers expose configuration through environment variables and write local outputs into artifacts/ by default.

Common Environment Variables

Variable	Meaning
`MODEL_NAME`	LLM model name used by generation / selection / verification
`API_KEY`	API key for the OpenAI-compatible backend
`BASE_URL`	Base URL for the backend
`ARTIFACT_ROOT`	Root directory for generated local outputs, default `artifacts`
`PYTHON_BIN`	Explicit Python interpreter override for shell wrappers
`PHASE1_SELECTION_MODE`	`execution` or `llm_prompt` in DPC
`NUM_GROUPING_ATTEMPTS`	Self-consistency samples for prompt-based SQL grouping
`EVAL_METRIC`	`bs_f1` or `ex` for DPC verification

Output Layout

artifacts/: local experiment outputs generated by your runs
results/: curated or checked-in result snapshots already included in the repository

⚡ Quick Start

The default workflow is:

Generate candidate SQLs
Run a selector
Evaluate predictions

1. Generate Candidate SQLs

MODEL_NAME=gpt-5-chat-latest \
API_KEY=your_api_key \
BASE_URL=your_base_url \
bash scripts/run_gen_baseline.sh

This writes candidates to:

artifacts/candidates/...

2. Run DPC Selection

Execution-based Phase 1:

MODEL_NAME=gpt-5-chat-latest \
API_KEY=your_api_key \
BASE_URL=your_base_url \
PHASE1_SELECTION_MODE=execution \
EVAL_METRIC=bs_f1 \
bash scripts/run_dpc_selection.sh

Prompt-based Phase 1:

MODEL_NAME=gpt-5-chat-latest \
API_KEY=your_api_key \
BASE_URL=your_base_url \
PHASE1_SELECTION_MODE=llm_prompt \
NUM_GROUPING_ATTEMPTS=3 \
bash scripts/run_dpc_selection.sh

3. Evaluate Predictions

Execution accuracy:

bash scripts/run_eval_ex.sh

Pass@K over candidate groups:

PASS_K=2 bash scripts/run_pass_n_eval.sh

🔬 Reproducing Experiments

Main Entry Points

Goal	Command
Generate generic candidate SQLs	`bash scripts/run_gen_baseline.sh`
Generate OmniSQL candidates	`bash scripts/run_gen_omnisql.sh`
Generate XiYan candidates	`bash scripts/run_gen_xiyan.sh`
Run DPC	`bash scripts/run_dpc_selection.sh`
Run SC	`bash scripts/run_sc_selection.sh`
Run MCS	`bash scripts/run_mcs_selection.sh`
Run EX-guided baseline	`bash scripts/run_ex_guided_selection.sh`
Run Random baseline	`bash scripts/run_random_selection.sh`
Run EX evaluation	`bash scripts/run_eval_ex.sh`
Run Pass@K evaluation	`bash scripts/run_pass_n_eval.sh`
Compare DPC vs SC over multiple candidate counts	`bash scripts/run_candidates_analysis.sh`

Additional Research Utilities

These are useful for deeper analysis and paper ablations:

Utility	Entry Point
USC selection	`uv run python baseline/run_usc_selection.py ...`
MCS without execution clustering	`uv run python baseline/run_mcs_selection_wo_execution.py ...`
MDD generation only	`uv run python baseline/run_mdd_generation.py ...`
Post-hoc MDD stats	`uv run python baseline/run_mdd_posthoc_stats.py ...`
Solver reliability analysis	`uv run python baseline/run_solver_reliability_experiment.py ...`
Majority-analysis between SC and DPC	`bash scripts/run_majority_analysis.sh`

Direct Python Usage

If you want full control beyond the shell wrappers:

uv run python baseline/run_dpc_selection.py \
  --dataset_type bird \
  --data_path data/bird/dev/dev.json \
  --db_root_path data/bird/dev/dev_databases \
  --pred_sqls_path artifacts/candidates/GPT-4o.json \
  --output_path artifacts/selected/DPC_Results.json \
  --model_name gpt-4o \
  --phase1_selection_mode execution \
  --eval_metric bs_f1 \
  --num_workers 8

📊 Input / Output Format

Candidate SQL File

The repository expects candidate SQLs in JSON format:

{
  "question_id_0": [
    "SELECT count(*) FROM head WHERE age > 56",
    "SELECT count(*) FROM head WHERE age >= 56"
  ],
  "question_id_1": [
    "SELECT ..."
  ]
}

Selection Output

Selectors typically write:

{
  "question_id_0": "SELECT ..."
}

Notes

DPC and some analysis scripts save incrementally and support resume.
artifacts/ is intended for your local runs.
results/ contains repository snapshots and should not be treated as the default write target.

🧪 Supported Datasets

The current codebase includes dataset loaders for:

BIRD
Spider

Relevant modules:

🗂 Project Structure

DPC-SQL/
├── artifacts/          # Local experiment outputs (gitignored)
├── baseline/           # Main runners for generation, baselines, and analysis
├── data/               # Dataset files and database directories
├── dpc/
│   ├── agents/         # Slicer / Tester / Solver / Selector agents
│   ├── core/           # DPC pipeline orchestration
│   ├── datasets/       # Spider / BIRD dataset loaders
│   ├── eval/           # BS-F1 / EX evaluation logic
│   ├── llm/            # OpenAI-compatible LLM wrapper
│   ├── prompts/        # Prompt templates
│   └── utils/          # DB execution, schema extraction, parsing, clustering
├── evaluation/         # Evaluation scripts
├── results/            # Checked-in snapshots / curated outputs
├── scripts/            # Experiment shell wrappers
├── pyproject.toml      # uv dependency definition
└── uv.lock             # Reproducible dependency lockfile

📝 Notes

This repository is organized as a research codebase, not as a packaged library.
The preferred user-facing workflow is:
- uv sync
- bash scripts/...
Prompt-based paths depend on an available OpenAI-compatible LLM backend and may incur API cost.

📚 Citation

If you find this repository useful, please cite our paper. Until the ACL Anthology entry is available, we recommend using the arXiv-form BibTeX below.

@misc{li2026dpc,
  title         = {DPC: Training-Free Text-to-SQL Candidate Selection via Dual-Paradigm Consistency},
  author        = {Boyan Li and Ou Ocean Kun Hei and Yue Yu and Yuyu Luo},
  year          = {2026},
  eprint        = {2604.15163},
  archivePrefix = {arXiv},
  primaryClass  = {cs.DB},
  doi           = {10.48550/arXiv.2604.15163},
  url           = {https://arxiv.org/abs/2604.15163},
  note          = {Accepted to ACL 2026 Main Track}
}

Paper link: https://arxiv.org/abs/2604.15163

📄 License

This project is released under the MIT License. See LICENSE for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DPC

🔔 News

✨ Overview

🧠 Method At A Glance

🚀 What’s Included

🛠 Installation

Dependencies

LLM Backend

⚙️ Configuration

Common Environment Variables

Output Layout

⚡ Quick Start

1. Generate Candidate SQLs

2. Run DPC Selection

3. Evaluate Predictions

🔬 Reproducing Experiments

Main Entry Points

Additional Research Utilities

Direct Python Usage

📊 Input / Output Format

Candidate SQL File

Selection Output

Notes

🧪 Supported Datasets

🗂 Project Structure

📝 Notes

📚 Citation

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
baseline		baseline
dpc		dpc
evaluation		evaluation
results		results
scripts		scripts
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

DPC

🔔 News

✨ Overview

🧠 Method At A Glance

🚀 What’s Included

🛠 Installation

Dependencies

LLM Backend

⚙️ Configuration

Common Environment Variables

Output Layout

⚡ Quick Start

1. Generate Candidate SQLs

2. Run DPC Selection

3. Evaluate Predictions

🔬 Reproducing Experiments

Main Entry Points

Additional Research Utilities

Direct Python Usage

📊 Input / Output Format

Candidate SQL File

Selection Output

Notes

🧪 Supported Datasets

🗂 Project Structure

📝 Notes

📚 Citation

📄 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages