A production-ready Natural Language β SQL API and web UI that converts plain-English questions into safe, audited SQL queries. This repository contains a FastAPI backend, a React frontend, and Redis for schema caching.
- π§ Intelligent Query Planning: LLM-powered query generation with comprehensive few-shot examples
- π Safety First: Structured query plans (no raw SQL from LLM) + deterministic SQL generation
- π οΈ Smart Error Recovery: LLM-powered repair agent that analyzes errors and regenerates corrected queries
- π Complex Query Support: CTEs, subqueries, nested AND/OR filters, window functions
- π Schema-Aware: Enhanced schema context with emphasized foreign key relationships
- π Full Auditing: All queries logged with detailed metadata and execution metrics
- The LLM produces a structured
QueryPlan(no raw SQL). - Deterministic code (
sql_builder) generates escaped, read-only SQL. - Queries are validated against the actual schema and audited.
- Intelligent error recovery with LLM-powered repair agent.
- User Input: User types a natural-language question in the UI.
- API Request: Frontend posts to
/query/{database_id}on the API (backend/api/query.py). - Schema Retrieval: Backend fetches the cached schema (
services/database_service.py) with enhanced formatting emphasizing foreign keys. - Query Planning: LLM service (
core/llm.py) uses few-shot examples to generate a structuredQueryPlan(models/query_plan.py).- Supports: CTEs, subqueries, nested AND/OR filters, aggregations, JOINs
- Validation: The plan is validated (
core/validator.py) against the actual schema. - SQL Generation:
core/sql_builder.pydeterministically builds safe SQL from the validated plan. - Execution:
core/executor.pyexecutes the SQL with timeouts and limits. - Error Recovery: If execution fails, the repair agent (
core/agent.py) uses LLM to analyze the error and regenerate a corrected query plan. - Logging: All queries are logged with metadata, execution time, and results.
backend/β FastAPI applicationcore/β Core business logicllm.pyβ LLM service with enhanced few-shot examplesplanner.pyβ Query planning orchestrationvalidator.pyβ Query plan validation (supports CTEs, subqueries, nested filters)sql_builder.pyβ Deterministic SQL generation (supports all QueryPlan features)executor.pyβ Safe query execution with timeoutsagent.pyβ LLM-powered repair agent for error recovery
api/β API routes (auth, databases, schema, query)models/β Database models and QueryPlan schemaservices/β Business services (database, cache)security/β JWT and encryption
frontend/β React UI (Vite) with centralized API clientdocker-compose.ymlβ Full stack orchestrationDockerfile.backend,Dockerfile.frontend,nginx.confβ Container configs
- Few-Shot Examples: Added 8 comprehensive examples covering simple queries, JOINs, aggregations, subqueries, and complex WHERE conditions
- Better Schema Context: Foreign key relationships are now emphasized with visual indicators and JOIN hints
- Extended QueryPlan Model: Support for CTEs, subqueries, nested AND/OR filter groups
- LLM-Powered Repair Agent: Analyzes execution errors and regenerates corrected query plans
- Context-Aware: Uses original question, query plan, and schema to provide intelligent fixes
- Fallback Support: Falls back to heuristic fixes if LLM is unavailable
- CTEs (WITH clauses): Support for Common Table Expressions
- Subqueries: Full subquery support in WHERE/HAVING clauses
- Nested Filters: AND/OR filter groups with arbitrary nesting
- Subquery Shortcuts: Special handling for common patterns (e.g., "above average")
The QueryPlan model supports:
- β Basic SELECT with aggregations (COUNT, SUM, AVG, MIN, MAX, COUNT_DISTINCT)
- β JOINs (INNER, LEFT, RIGHT)
- β WHERE conditions with all operators
- β GROUP BY and HAVING
- β ORDER BY with aliases
- β CTEs (WITH clauses)
- β Subqueries in filters
- β Nested AND/OR filter groups
- β Date macros (TODAY, LAST_MONTH, etc.)
- β LIMIT and OFFSET
- Docker & Docker Compose (recommended) OR
- Python 3.11+, Node 20+, npm/yarn for local development
Create a .env file or export variables for production. Important variables used in docker-compose.yml:
GROQ_API_KEYβ LLM / external API key (required)JWT_SECRET_KEYβ JWT signing secretENCRYPTION_KEYβ Key used to encrypt database credentialsDATABASE_URLβ URL for the app database (default used in compose: SQLite)REDIS_URLβ Redis connection (e.g.redis://redis:6379/0)ENVIRONMENT,DEBUG,HOST,PORTβ runtime settings
- Create and activate a venv:
python -m venv .venv
# Windows
.\.venv\Scripts\activate
# macOS / Linux
source .venv/bin/activate- Install backend dependencies:
pip install -r backend/requirements.txt- Run the API with auto-reload (for development):
cd backend
uvicorn main:app --reload --port 8000- API docs are available at
http://localhost:8000/docs.
- Install node deps and run dev server:
cd frontend
npm ci
npm run dev- Open the dev URL shown by Vite (usually
http://localhost:5173). ConfigureVITE_API_URLin.envorpackage.jsonas needed.
From the repo root, build and start all services:
# Build and run backend, frontend, and redis
docker compose up --build- Backend:
http://localhost:8000 - Frontend:
http://localhost/(port 80) - Redis:
localhost:6379
To run just backend + redis:
docker compose up --build backend redisTo stop and remove containers:
docker compose downdocker-compose.ymlexposes services and injects env vars for production. There is an emptydocker-compose.dev.ymlplaceholder that you can create for developer mounts (live code, hot-reload).- Backend image uses Gunicorn + Uvicorn workers in production mode (see
Dockerfile.backend). - Frontend is built in a multi-stage image and served by Nginx (see
Dockerfile.frontendandnginx.conf).
Use Docker logs or run services locally to get stack traces:
# Tail backend logs
docker compose logs -f backend
# Or run backend locally and watch stdout
uvicorn main:app --reload- Redis not connected: The app falls back to in-memory cache; check
REDIS_URLand that theredisservice is running. - Missing env vars: Backend will warn if critical keys are not set (JWT, ENCRYPTION_KEY, LLM key). Provide them in
.envor the host environment. - Database connection issues: Use the
databasesendpoints in the UI to test a connection; server-side, checkbackend/services/database_service.pyfor host validation and connection errors.
- Enable verbose SQL/engine logs by setting
DEBUG=truein environment or by changingechoindb/session.py. - Recreate and inspect DB file if using SQLite (
data/text_to_sql.dbby default insidebackend_datavolume). - For step debugging in planning/execution, add temporary logging around:
core/planner.pyβ Query planning pipelinecore/llm.pyβ LLM prompt and responsecore/sql_builder.pyβ SQL generationcore/executor.pyβ Query executioncore/agent.pyβ Error recovery flow
- Check the
query_planfield in API responses to see the structured plan - Review
agent_flowin error responses to see repair attempts - Enable debug logging to see LLM prompts and responses
- There are no unit tests included in this repository by default. For safe changes, add tests around:
core/sql_builder.pyβ Ensure query plans map to safe SQLcore/validator.pyβ Validate query plans against schemascore/agent.pyβ Test error recovery scenarioscore/llm.pyβ Verify few-shot examples produce correct plans
- Structured Query Plans β LLM outputs structured plans, not raw SQL
- Deterministic SQL Generation β Code builds SQL from validated plans
- Read-Only Queries β Only SELECT statements allowed
- Query Validation β Plans validated against actual schema
- Encrypted Credentials β Database passwords encrypted at rest
- Query Auditing β All queries logged with metadata
- Timeout Enforcement β Queries timeout after configured limit
- Row Limits β Results capped at MAX_QUERY_ROWS (1000)
- You ask a question in plain English
- AI analyzes your question + database schema
- AI generates a structured query plan (NOT raw SQL)
- Plan is validated against your actual schema
- Safe SQL is generated from the validated plan
- Query is executed with timeouts and limits
- If execution fails, repair agent attempts intelligent fixes
- Results returned to you
Key insight: The LLM never writes SQL directly. It outputs structured intent, and deterministic code generates the SQL.
POST /auth/register- Create accountPOST /auth/login- Get JWT tokenPOST /databases- Connect your MySQL databaseGET /schema/{database_id}- View your schemaPOST /query/{database_id}- Ask questions!
- Fork the repo, create a feature branch, and open a PR.
- Keep changes minimal and focused; add tests for any core logic changes.
- For high-level questions about the architecture or help running the stack, open an issue or reach out to the maintainer.