Skip to content

HKUSTDial/DPC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

17 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

DPC

DPC: Training-Free Text-to-SQL Candidate Selection via Dual-Paradigm Consistency

Official research code for our ACL 2026 Main Track paper

Boyan Li Β· Ou Ocean Kun Hei Β· Yue Yu Β· Yuyu Luo

Paper: arXiv 2604.15163 Venue: ACL 2026 Main Track Python 3.10+ Package manager: uv

Paper β€’ Overview β€’ Installation β€’ Quick Start β€’ Experiments β€’ Citation

Task
Training-free Text-to-SQL candidate selection
Core Idea
Verify SQL with a Python/Pandas solution on a minimal distinguishing database
Outcome
Reduce selection errors caused by shared LLM blind spots and self-consistency bias

πŸ”” News

  • 2026-04-17: Our paper is available on arXiv: 2604.15163.
  • 2026: DPC is accepted to ACL 2026 Main Track.

✨ Overview

DPC is a training-free method for inference-time Text-to-SQL candidate selection. Instead of asking an LLM judge to directly decide which SQL is correct, DPC introduces a second reasoning channel in Python/Pandas, constructs a Minimal Distinguishing Database (MDD), and chooses the SQL candidate whose execution is more consistent with the Python solution.

DPC reframes SQL selection from "guess which candidate is right on hidden data" into "verify which candidate survives on visible distinguishing data."

This repository accompanies the paper DPC: Training-Free Text-to-SQL Candidate Selection via Dual-Paradigm Consistency, which reports consistent gains on BIRD and Spider, including up to +2.2 absolute accuracy over strong training-free baselines.

Problem LLM-as-a-Judge for SQL often shares the same reasoning failures as SQL generation itself.
Key Idea Cross-check candidate SQLs with a second program form that has different failure modes: Python/Pandas.
Verification Target Construct a Minimal Distinguishing Database where competing SQLs diverge, then compare them against a Python answer.
Scope Inference-time selection only. No task-specific finetuning or verifier training is required.

🧠 Method At A Glance

  1. Candidate Selection Group candidate SQLs and identify a champion and challenger. Supported grouping modes:

    • execution: cluster by execution result on the original database
    • llm_prompt: group by prompt-based logical equivalence
  2. Schema Slicing Keep only the tables and columns needed for the duel, then validate the slice with dry-run checks.

  3. Minimal Distinguishing Database Construction Generate a compact synthetic database slice where the two SQLs produce different answers.

  4. Cross-Paradigm Verification Ask a Python solver to answer the same NL question over the synthetic data and compare both SQL outputs against the Python result.

  5. Final Decision Select the SQL candidate that is more consistent with the Python proxy answer.

Why this is different from standard self-consistency

Standard self-consistency still votes within the same program form: SQL. DPC explicitly introduces a second program modality with different inductive biases and failure modes, which helps avoid consensus over the same logical mistake.


πŸš€ What’s Included

  • Main DPC pipeline
    • SlicerAgent, TesterAgent, PythonSolverAgent, EquivalenceGrouperAgent
  • Training-free baselines
    • SC, MCS, USC, EX-guided, Random
  • Evaluation utilities
    • execution accuracy
    • Pass@K / candidate upper bound
    • majority-analysis and solver-reliability analysis
  • Research-oriented runners
    • resumable batch processing
    • local artifact management under artifacts/
    • checked-in result snapshots under results/

πŸ›  Installation

This project uses uv for dependency management.

uv sync

This creates a local .venv/. All shell wrappers in scripts/ automatically prefer .venv/bin/python if it exists, so after uv sync you can usually run them directly with bash.

If you want to run Python entry points directly, use:

uv run python baseline/run_dpc_selection.py --help

Dependencies

The main runtime dependencies are:

  • openai
  • pandas
  • numpy
  • scipy
  • tabulate
  • tqdm
  • chardet

LLM Backend

The repository uses an OpenAI-compatible chat API wrapper. You can provide credentials in either of these ways:

  • shell wrappers: API_KEY, BASE_URL
  • direct Python entry points: --api_key, --base_url
  • fallback env vars used by the LLM wrapper: OPENAI_API_KEY, OPENAI_BASE_URL

βš™οΈ Configuration

Most experiments are driven by shell wrappers under scripts/. These wrappers expose configuration through environment variables and write local outputs into artifacts/ by default.

Common Environment Variables

Variable Meaning
MODEL_NAME LLM model name used by generation / selection / verification
API_KEY API key for the OpenAI-compatible backend
BASE_URL Base URL for the backend
ARTIFACT_ROOT Root directory for generated local outputs, default artifacts
PYTHON_BIN Explicit Python interpreter override for shell wrappers
PHASE1_SELECTION_MODE execution or llm_prompt in DPC
NUM_GROUPING_ATTEMPTS Self-consistency samples for prompt-based SQL grouping
EVAL_METRIC bs_f1 or ex for DPC verification

Output Layout

  • artifacts/: local experiment outputs generated by your runs
  • results/: curated or checked-in result snapshots already included in the repository

⚑ Quick Start

The default workflow is:

  1. Generate candidate SQLs
  2. Run a selector
  3. Evaluate predictions

1. Generate Candidate SQLs

MODEL_NAME=gpt-5-chat-latest \
API_KEY=your_api_key \
BASE_URL=your_base_url \
bash scripts/run_gen_baseline.sh

This writes candidates to:

artifacts/candidates/...

2. Run DPC Selection

Execution-based Phase 1:

MODEL_NAME=gpt-5-chat-latest \
API_KEY=your_api_key \
BASE_URL=your_base_url \
PHASE1_SELECTION_MODE=execution \
EVAL_METRIC=bs_f1 \
bash scripts/run_dpc_selection.sh

Prompt-based Phase 1:

MODEL_NAME=gpt-5-chat-latest \
API_KEY=your_api_key \
BASE_URL=your_base_url \
PHASE1_SELECTION_MODE=llm_prompt \
NUM_GROUPING_ATTEMPTS=3 \
bash scripts/run_dpc_selection.sh

3. Evaluate Predictions

Execution accuracy:

bash scripts/run_eval_ex.sh

Pass@K over candidate groups:

PASS_K=2 bash scripts/run_pass_n_eval.sh

πŸ”¬ Reproducing Experiments

Main Entry Points

Goal Command
Generate generic candidate SQLs bash scripts/run_gen_baseline.sh
Generate OmniSQL candidates bash scripts/run_gen_omnisql.sh
Generate XiYan candidates bash scripts/run_gen_xiyan.sh
Run DPC bash scripts/run_dpc_selection.sh
Run SC bash scripts/run_sc_selection.sh
Run MCS bash scripts/run_mcs_selection.sh
Run EX-guided baseline bash scripts/run_ex_guided_selection.sh
Run Random baseline bash scripts/run_random_selection.sh
Run EX evaluation bash scripts/run_eval_ex.sh
Run Pass@K evaluation bash scripts/run_pass_n_eval.sh
Compare DPC vs SC over multiple candidate counts bash scripts/run_candidates_analysis.sh

Additional Research Utilities

These are useful for deeper analysis and paper ablations:

Utility Entry Point
USC selection uv run python baseline/run_usc_selection.py ...
MCS without execution clustering uv run python baseline/run_mcs_selection_wo_execution.py ...
MDD generation only uv run python baseline/run_mdd_generation.py ...
Post-hoc MDD stats uv run python baseline/run_mdd_posthoc_stats.py ...
Solver reliability analysis uv run python baseline/run_solver_reliability_experiment.py ...
Majority-analysis between SC and DPC bash scripts/run_majority_analysis.sh

Direct Python Usage

If you want full control beyond the shell wrappers:

uv run python baseline/run_dpc_selection.py \
  --dataset_type bird \
  --data_path data/bird/dev/dev.json \
  --db_root_path data/bird/dev/dev_databases \
  --pred_sqls_path artifacts/candidates/GPT-4o.json \
  --output_path artifacts/selected/DPC_Results.json \
  --model_name gpt-4o \
  --phase1_selection_mode execution \
  --eval_metric bs_f1 \
  --num_workers 8

πŸ“Š Input / Output Format

Candidate SQL File

The repository expects candidate SQLs in JSON format:

{
  "question_id_0": [
    "SELECT count(*) FROM head WHERE age > 56",
    "SELECT count(*) FROM head WHERE age >= 56"
  ],
  "question_id_1": [
    "SELECT ..."
  ]
}

Selection Output

Selectors typically write:

{
  "question_id_0": "SELECT ..."
}

Notes

  • DPC and some analysis scripts save incrementally and support resume.
  • artifacts/ is intended for your local runs.
  • results/ contains repository snapshots and should not be treated as the default write target.

πŸ§ͺ Supported Datasets

The current codebase includes dataset loaders for:

  • BIRD
  • Spider

Relevant modules:


πŸ—‚ Project Structure

DPC-SQL/
β”œβ”€β”€ artifacts/          # Local experiment outputs (gitignored)
β”œβ”€β”€ baseline/           # Main runners for generation, baselines, and analysis
β”œβ”€β”€ data/               # Dataset files and database directories
β”œβ”€β”€ dpc/
β”‚   β”œβ”€β”€ agents/         # Slicer / Tester / Solver / Selector agents
β”‚   β”œβ”€β”€ core/           # DPC pipeline orchestration
β”‚   β”œβ”€β”€ datasets/       # Spider / BIRD dataset loaders
β”‚   β”œβ”€β”€ eval/           # BS-F1 / EX evaluation logic
β”‚   β”œβ”€β”€ llm/            # OpenAI-compatible LLM wrapper
β”‚   β”œβ”€β”€ prompts/        # Prompt templates
β”‚   └── utils/          # DB execution, schema extraction, parsing, clustering
β”œβ”€β”€ evaluation/         # Evaluation scripts
β”œβ”€β”€ results/            # Checked-in snapshots / curated outputs
β”œβ”€β”€ scripts/            # Experiment shell wrappers
β”œβ”€β”€ pyproject.toml      # uv dependency definition
└── uv.lock             # Reproducible dependency lockfile

πŸ“ Notes

  • This repository is organized as a research codebase, not as a packaged library.
  • The preferred user-facing workflow is:
    • uv sync
    • bash scripts/...
  • Prompt-based paths depend on an available OpenAI-compatible LLM backend and may incur API cost.

πŸ“š Citation

If you find this repository useful, please cite our paper. Until the ACL Anthology entry is available, we recommend using the arXiv-form BibTeX below.

@misc{li2026dpc,
  title         = {DPC: Training-Free Text-to-SQL Candidate Selection via Dual-Paradigm Consistency},
  author        = {Boyan Li and Ou Ocean Kun Hei and Yue Yu and Yuyu Luo},
  year          = {2026},
  eprint        = {2604.15163},
  archivePrefix = {arXiv},
  primaryClass  = {cs.DB},
  doi           = {10.48550/arXiv.2604.15163},
  url           = {https://arxiv.org/abs/2604.15163},
  note          = {Accepted to ACL 2026 Main Track}
}

Paper link: https://arxiv.org/abs/2604.15163


πŸ“„ License

This project is released under the MIT License. See LICENSE for details.

About

πŸ”₯[ACL'26 (Main)] Official repository for the paper "DPC: Training-Free Text-to-SQL Candidate Selection via Dual-Paradigm Consistency"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors