Experimental NL → SQL (ClickHouse) pipeline using GPT-5 Context Free Grammar (CFG) constraints.
- FastAPI backend with two paths:
POST /querysimple echo (baseline)POST /nl-querynatural language → constrained SQL → (mock) ClickHouse rows
- Constrained grammar (
app/grammars/clickhouse_sql.bnf) limiting output to a safe SELECT subset - Mock mode (default) – no real OpenAI or ClickHouse credentials required
- Deterministic heuristic fallback translation when API key absent
- Minimal React UI: toggle between Echo + CFG NL→SQL modes, view generated SQL + JSON rows
- Autocomplete (client): inline suggestion list while typing (Arrow / Tab / Enter) limited to top 5 matches from curated examples + session history
- Lightweight eval harness (
backend/evals/run_evals.py) over 3+ test cases
backend/
app/
main.py # FastAPI app + /nl-query
config.py # Settings + mock mode
grammars/clickhouse_sql.bnf
services/
nl_to_sql.py # NL → SQL (mock or OpenAI)
clickhouse_client.py # Mock data executor
tests/
test_nl_query.py
evals/
dataset.jsonl
run_evals.py
client/
src/pages/Query.tsx # UI with mode toggle
| Variable | Purpose | Default |
|---|---|---|
OPENAI_API_KEY |
Real GPT-5 access (optional in mock) | None |
OPENAI_MODEL |
Model name | gpt-5 |
MOCK_MODE |
Force mock path (true/false) |
true |
CLICKHOUSE_HOST |
ClickHouse hostname (Docker: localhost) |
None |
CLICKHOUSE_PORT |
HTTP port (native container default 8123) | None |
CLICKHOUSE_DATABASE |
Database/schema name (e.g. default) |
None |
CLICKHOUSE_USER |
Username (optional; often default locally) |
None |
CLICKHOUSE_PASSWORD |
Password (optional; set for secured/cloud instances) | None |
CLICKHOUSE_SECURE |
true to enable TLS (ClickHouse Cloud / HTTPS) |
false |
CLICKHOUSE_CA_CERT |
Path to CA certificate file (only if custom CA / enterprise) | None |
Minimum for local non‑secure Docker: CLICKHOUSE_HOST, CLICKHOUSE_PORT, CLICKHOUSE_DATABASE, and MOCK_MODE=false.
For ClickHouse Cloud also set: CLICKHOUSE_USER, CLICKHOUSE_PASSWORD, CLICKHOUSE_SECURE=true (and optionally CLICKHOUSE_CA_CERT).
Set MOCK_MODE=false and provide OPENAI_API_KEY to exercise real grammar calls (parameter name grammar assumed; adjust when official SDK confirms interface).
Located at backend/app/grammars/clickhouse_sql.bnf – restricts to:
SELECTonly- Aggregates: count, sum, avg, min, max
- Simple WHERE equality or BETWEEN time filters
- Optional GROUP BY, LIMIT
- Basic relative time pattern (prototype)
cd backend
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txt
export MOCK_MODE=true
uvicorn app.main:app --reload --port 8000
- Start local ClickHouse via Docker:
docker run -d --name ch -p 8123:8123 -p 9000:9000 clickhouse/clickhouse-server:latest
- Wait a few seconds for startup.
- Create table (matches grammar expectations). Note: CSV header uses
subscription_planwhile earlier examples showedsubscription_plane(typo). Use the schema below:
docker exec -i ch clickhouse-client --query "CREATE TABLE IF NOT EXISTS default.MOCK_DATA (\n id UInt32,\n name String,\n email String,\n age Int64,\n signup_date DateTime,\n country String,\n is_active Bool,\n subscription_plan String,\n last_login DateTime,\n balance Int64\n) ENGINE=MergeTree ORDER BY id;"
- Load sample CSV (
sample_files/MOCK_DATA.csv). This CSV contains realistic rows for development & testing.
docker exec -i ch bash -c "cat > /tmp/MOCK_DATA.csv" < sample_files/MOCK_DATA.csv
docker exec -i ch clickhouse-client --query "INSERT INTO default.MOCK_DATA FORMAT CSVWithNames" < sample_files/MOCK_DATA.csv
- (Optional) Verify row count:
docker exec -it ch clickhouse-client --query "SELECT count() FROM default.MOCK_DATA"
- Run backend pointing at ClickHouse:
cd backend
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
export CLICKHOUSE_HOST=localhost
export CLICKHOUSE_PORT=8123
export CLICKHOUSE_DATABASE=default
export MOCK_MODE=false
# Optional (only if you have model access): export OPENAI_API_KEY=sk-...; export OPENAI_MODEL=gpt-5
uvicorn app.main:app --reload --port 8000
Backend now serves:
cd client
npm install
npm start
Open http://localhost:3000 and use the Query page.
Set an environment variable REACT_APP_API_BASE to the fully qualified backend base URL (no trailing slash), for example:
REACT_APP_API_BASE=https://cfg-evals-backend.onrender.com
The UI will call ${REACT_APP_API_BASE}/query and ${REACT_APP_API_BASE}/nl-query. If the variable is unset it falls back to relative paths (useful when reverse‑proxying both under same domain). After updating env vars on Vercel, trigger a redeploy for them to take effect.
The UI includes:
- Mode toggle (Echo vs NL→SQL)
- Generated SQL + results (when NL mode)
- Raw API response block
- Collapsible session Query History (button on right). Each history item can expand to show details and provides:
- Run Again (auto executes immediately)
- Load Only (populate textarea but keep cleared results)
- Collapse
Example natural language queries mapped to default.MOCK_DATA:
| Natural Language | Example Generated SQL |
|---|---|
| "Count all users" | SELECT count(*) FROM default.MOCK_DATA |
| "Sum total balance in the last 30 hours" | SELECT sum(balance) FROM default.MOCK_DATA WHERE signup_date >= subtractHours(now(), 30) |
| "Find users whose name contains ali" | SELECT * FROM default.MOCK_DATA WHERE name ILIKE '%ali%' |
Example request:
curl -s -X POST localhost:8000/nl-query \
-H 'Content-Type: application/json' \
-d '{"question":"Sum total balance in the last 30 hours"}' | jq
Example response (mock):
{
"sql": "SELECT sum(balance) FROM default.MOCK_DATA WHERE signup_date >= subtractHours(now(), 30)",
"rows": [{ "sum": 5200 }],
"mocked": true,
"warning": "Mock mode enabled: using heuristic translation + sample data"
}Dataset: backend/evals/dataset.jsonl (JSONL lines with regex expectations). Run:
cd backend
python -m evals.run_evals
Output JSON summary:
{
"summary": {"total": 3, "passed": 3, "failed": 0},
"results": [ ... ]
}Exit code non‑zero if any fail (CI friendly).
cd backend
pytest -q
- Add more grammar production rules for JOINs after auditing safety.
- Introduce caching layer for repeated NL queries.
- Add latency + token metrics collection.
- Persist evaluation history.
Create image:
docker build -t cfg-evals-backend -f backend/Dockerfile .
Run (mock mode):
docker run -p 8000:8000 cfg-evals-backend
Run (real LLM + ClickHouse Cloud):
docker run -p 8000:8000 \
-e OPENAI_API_KEY=$OPENAI_API_KEY \
-e OPENAI_MODEL=gpt-5 \
-e MOCK_MODE=false \
-e CLICKHOUSE_HOST=$CLICKHOUSE_HOST \
-e CLICKHOUSE_PORT=$CLICKHOUSE_PORT \
-e CLICKHOUSE_USER=$CLICKHOUSE_USER \
-e CLICKHOUSE_PASSWORD=$CLICKHOUSE_PASSWORD \
-e CLICKHOUSE_DATABASE=default \
-e CLICKHOUSE_SECURE=true \
cfg-evals-backend
See docker-compose.yml for combined service configuration (after added).
- Run behind a reverse proxy with TLS termination.
- Set
temperature=1for deterministic SQL. - Log (NL query, generated SQL, latency) excluding PII.
- If model call fails, return 503; do not silently degrade to mock.
- Pin dependency versions (already in
requirements.txt). - Periodically re-run evals and diff SQL outputs.
sample_files/MOCK_DATA.csv is synthetic demo data generated for this project (no real user information). Feel free to replace with your own dataset; update the grammar/table schema accordingly if you add columns.