Text-to-SQL v2

A production-ready Natural Language → SQL API and web UI that converts plain-English questions into safe, audited SQL queries. This repository contains a FastAPI backend, a React frontend, and Redis for schema caching.

✨ Key Features

🧠 Intelligent Query Planning: LLM-powered query generation with comprehensive few-shot examples
🔒 Safety First: Structured query plans (no raw SQL from LLM) + deterministic SQL generation
🛠️ Smart Error Recovery: LLM-powered repair agent that analyzes errors and regenerates corrected queries
📊 Complex Query Support: CTEs, subqueries, nested AND/OR filters, window functions
🔗 Schema-Aware: Enhanced schema context with emphasized foreign key relationships
📝 Full Auditing: All queries logged with detailed metadata and execution metrics

🎯 Key Principles

The LLM produces a structured QueryPlan (no raw SQL).
Deterministic code (sql_builder) generates escaped, read-only SQL.
Queries are validated against the actual schema and audited.
Intelligent error recovery with LLM-powered repair agent.

🏗️ Architecture (High-Level)

User Input: User types a natural-language question in the UI.
API Request: Frontend posts to /query/{database_id} on the API (backend/api/query.py).
Schema Retrieval: Backend fetches the cached schema (services/database_service.py) with enhanced formatting emphasizing foreign keys.
Query Planning: LLM service (core/llm.py) uses few-shot examples to generate a structured QueryPlan (models/query_plan.py).
- Supports: CTEs, subqueries, nested AND/OR filters, aggregations, JOINs
Validation: The plan is validated (core/validator.py) against the actual schema.
SQL Generation: core/sql_builder.py deterministically builds safe SQL from the validated plan.
Execution: core/executor.py executes the SQL with timeouts and limits.
Error Recovery: If execution fails, the repair agent (core/agent.py) uses LLM to analyze the error and regenerate a corrected query plan.
Logging: All queries are logged with metadata, execution time, and results.

📁 Project Structure

backend/ — FastAPI application
- core/ — Core business logic
  - llm.py — LLM service with enhanced few-shot examples
  - planner.py — Query planning orchestration
  - validator.py — Query plan validation (supports CTEs, subqueries, nested filters)
  - sql_builder.py — Deterministic SQL generation (supports all QueryPlan features)
  - executor.py — Safe query execution with timeouts
  - agent.py — LLM-powered repair agent for error recovery
- api/ — API routes (auth, databases, schema, query)
- models/ — Database models and QueryPlan schema
- services/ — Business services (database, cache)
- security/ — JWT and encryption
frontend/ — React UI (Vite) with centralized API client
docker-compose.yml — Full stack orchestration
Dockerfile.backend, Dockerfile.frontend, nginx.conf — Container configs

🚀 Recent Improvements

Enhanced Query Planning

Few-Shot Examples: Added 8 comprehensive examples covering simple queries, JOINs, aggregations, subqueries, and complex WHERE conditions
Better Schema Context: Foreign key relationships are now emphasized with visual indicators and JOIN hints
Extended QueryPlan Model: Support for CTEs, subqueries, nested AND/OR filter groups

Intelligent Error Recovery

LLM-Powered Repair Agent: Analyzes execution errors and regenerates corrected query plans
Context-Aware: Uses original question, query plan, and schema to provide intelligent fixes
Fallback Support: Falls back to heuristic fixes if LLM is unavailable

Complex Query Support

CTEs (WITH clauses): Support for Common Table Expressions
Subqueries: Full subquery support in WHERE/HAVING clauses
Nested Filters: AND/OR filter groups with arbitrary nesting
Subquery Shortcuts: Special handling for common patterns (e.g., "above average")

📊 Query Plan Features

The QueryPlan model supports:

✅ Basic SELECT with aggregations (COUNT, SUM, AVG, MIN, MAX, COUNT_DISTINCT)
✅ JOINs (INNER, LEFT, RIGHT)
✅ WHERE conditions with all operators
✅ GROUP BY and HAVING
✅ ORDER BY with aliases
✅ CTEs (WITH clauses)
✅ Subqueries in filters
✅ Nested AND/OR filter groups
✅ Date macros (TODAY, LAST_MONTH, etc.)
✅ LIMIT and OFFSET

Prerequisites

Docker & Docker Compose (recommended) OR
Python 3.11+, Node 20+, npm/yarn for local development

Environment Variables

Create a .env file or export variables for production. Important variables used in docker-compose.yml:

GROQ_API_KEY — LLM / external API key (required)
JWT_SECRET_KEY — JWT signing secret
ENCRYPTION_KEY — Key used to encrypt database credentials
DATABASE_URL — URL for the app database (default used in compose: SQLite)
REDIS_URL — Redis connection (e.g. redis://redis:6379/0)
ENVIRONMENT, DEBUG, HOST, PORT — runtime settings

Local Development (Backend)

Create and activate a venv:

python -m venv .venv
# Windows
.\.venv\Scripts\activate
# macOS / Linux
source .venv/bin/activate

Install backend dependencies:

pip install -r backend/requirements.txt

Run the API with auto-reload (for development):

cd backend
uvicorn main:app --reload --port 8000

API docs are available at http://localhost:8000/docs.

Local Development (Frontend)

Install node deps and run dev server:

cd frontend
npm ci
npm run dev

Open the dev URL shown by Vite (usually http://localhost:5173). Configure VITE_API_URL in .env or package.json as needed.

Run the Full Stack with Docker (Recommended)

From the repo root, build and start all services:

# Build and run backend, frontend, and redis
docker compose up --build

Backend: http://localhost:8000
Frontend: http://localhost/ (port 80)
Redis: localhost:6379

To run just backend + redis:

docker compose up --build backend redis

To stop and remove containers:

docker compose down

Notes about Docker Setup

docker-compose.yml exposes services and injects env vars for production. There is an empty docker-compose.dev.yml placeholder that you can create for developer mounts (live code, hot-reload).
Backend image uses Gunicorn + Uvicorn workers in production mode (see Dockerfile.backend).
Frontend is built in a multi-stage image and served by Nginx (see Dockerfile.frontend and nginx.conf).

🐛 Debugging & Troubleshooting

Logs

Use Docker logs or run services locally to get stack traces:

# Tail backend logs
docker compose logs -f backend

# Or run backend locally and watch stdout
uvicorn main:app --reload

Common Issues

Redis not connected: The app falls back to in-memory cache; check REDIS_URL and that the redis service is running.
Missing env vars: Backend will warn if critical keys are not set (JWT, ENCRYPTION_KEY, LLM key). Provide them in .env or the host environment.
Database connection issues: Use the databases endpoints in the UI to test a connection; server-side, check backend/services/database_service.py for host validation and connection errors.

Backend Debugging

Enable verbose SQL/engine logs by setting DEBUG=true in environment or by changing echo in db/session.py.
Recreate and inspect DB file if using SQLite (data/text_to_sql.db by default inside backend_data volume).
For step debugging in planning/execution, add temporary logging around:
- core/planner.py — Query planning pipeline
- core/llm.py — LLM prompt and response
- core/sql_builder.py — SQL generation
- core/executor.py — Query execution
- core/agent.py — Error recovery flow

Understanding Query Plans

Check the query_plan field in API responses to see the structured plan
Review agent_flow in error responses to see repair attempts
Enable debug logging to see LLM prompts and responses

🧪 Testing

There are no unit tests included in this repository by default. For safe changes, add tests around:
- core/sql_builder.py — Ensure query plans map to safe SQL
- core/validator.py — Validate query plans against schemas
- core/agent.py — Test error recovery scenarios
- core/llm.py — Verify few-shot examples produce correct plans

🔐 Security Features

Structured Query Plans — LLM outputs structured plans, not raw SQL
Deterministic SQL Generation — Code builds SQL from validated plans
Read-Only Queries — Only SELECT statements allowed
Query Validation — Plans validated against actual schema
Encrypted Credentials — Database passwords encrypted at rest
Query Auditing — All queries logged with metadata
Timeout Enforcement — Queries timeout after configured limit
Row Limits — Results capped at MAX_QUERY_ROWS (1000)

📖 How It Works

You ask a question in plain English
AI analyzes your question + database schema
AI generates a structured query plan (NOT raw SQL)
Plan is validated against your actual schema
Safe SQL is generated from the validated plan
Query is executed with timeouts and limits
If execution fails, repair agent attempts intelligent fixes
Results returned to you

Key insight: The LLM never writes SQL directly. It outputs structured intent, and deterministic code generates the SQL.

🚀 Quick Start

POST /auth/register - Create account
POST /auth/login - Get JWT token
POST /databases - Connect your MySQL database
GET /schema/{database_id} - View your schema
POST /query/{database_id} - Ask questions!

Contributing

Fork the repo, create a feature branch, and open a PR.
Keep changes minimal and focused; add tests for any core logic changes.

License

Contact

For high-level questions about the architecture or help running the stack, open an issue or reach out to the maintainer.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text-to-SQL v2

✨ Key Features

🎯 Key Principles

🏗️ Architecture (High-Level)

📁 Project Structure

🚀 Recent Improvements

Enhanced Query Planning

Intelligent Error Recovery

Complex Query Support

📊 Query Plan Features

Prerequisites

Environment Variables

Local Development (Backend)

Local Development (Frontend)

Run the Full Stack with Docker (Recommended)

Notes about Docker Setup

🐛 Debugging & Troubleshooting

Logs

Common Issues

Backend Debugging

Understanding Query Plans

🧪 Testing

🔐 Security Features

📖 How It Works

🚀 Quick Start

Contributing

License

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
backend		backend
frontend		frontend
.gitignore		.gitignore
Dockerfile.backend		Dockerfile.backend
Dockerfile.frontend		Dockerfile.frontend
README.md		README.md
docker-compose.yml		docker-compose.yml
fly.toml		fly.toml
nginx.conf		nginx.conf

Folders and files

Latest commit

History

Repository files navigation

Text-to-SQL v2

✨ Key Features

🎯 Key Principles

🏗️ Architecture (High-Level)

📁 Project Structure

🚀 Recent Improvements

Enhanced Query Planning

Intelligent Error Recovery

Complex Query Support

📊 Query Plan Features

Prerequisites

Environment Variables

Local Development (Backend)

Local Development (Frontend)

Run the Full Stack with Docker (Recommended)

Notes about Docker Setup

🐛 Debugging & Troubleshooting

Logs

Common Issues

Backend Debugging

Understanding Query Plans

🧪 Testing

🔐 Security Features

📖 How It Works

🚀 Quick Start

Contributing

License

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages