Skip to content

Ritesh17-stack/Agentic-Text-2-SQL-v2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Text-to-SQL v2

A production-ready Natural Language β†’ SQL API and web UI that converts plain-English questions into safe, audited SQL queries. This repository contains a FastAPI backend, a React frontend, and Redis for schema caching.

✨ Key Features

  • 🧠 Intelligent Query Planning: LLM-powered query generation with comprehensive few-shot examples
  • πŸ”’ Safety First: Structured query plans (no raw SQL from LLM) + deterministic SQL generation
  • πŸ› οΈ Smart Error Recovery: LLM-powered repair agent that analyzes errors and regenerates corrected queries
  • πŸ“Š Complex Query Support: CTEs, subqueries, nested AND/OR filters, window functions
  • πŸ”— Schema-Aware: Enhanced schema context with emphasized foreign key relationships
  • πŸ“ Full Auditing: All queries logged with detailed metadata and execution metrics

🎯 Key Principles

  • The LLM produces a structured QueryPlan (no raw SQL).
  • Deterministic code (sql_builder) generates escaped, read-only SQL.
  • Queries are validated against the actual schema and audited.
  • Intelligent error recovery with LLM-powered repair agent.

πŸ—οΈ Architecture (High-Level)

  1. User Input: User types a natural-language question in the UI.
  2. API Request: Frontend posts to /query/{database_id} on the API (backend/api/query.py).
  3. Schema Retrieval: Backend fetches the cached schema (services/database_service.py) with enhanced formatting emphasizing foreign keys.
  4. Query Planning: LLM service (core/llm.py) uses few-shot examples to generate a structured QueryPlan (models/query_plan.py).
    • Supports: CTEs, subqueries, nested AND/OR filters, aggregations, JOINs
  5. Validation: The plan is validated (core/validator.py) against the actual schema.
  6. SQL Generation: core/sql_builder.py deterministically builds safe SQL from the validated plan.
  7. Execution: core/executor.py executes the SQL with timeouts and limits.
  8. Error Recovery: If execution fails, the repair agent (core/agent.py) uses LLM to analyze the error and regenerate a corrected query plan.
  9. Logging: All queries are logged with metadata, execution time, and results.

πŸ“ Project Structure

  • backend/ β€” FastAPI application
    • core/ β€” Core business logic
      • llm.py β€” LLM service with enhanced few-shot examples
      • planner.py β€” Query planning orchestration
      • validator.py β€” Query plan validation (supports CTEs, subqueries, nested filters)
      • sql_builder.py β€” Deterministic SQL generation (supports all QueryPlan features)
      • executor.py β€” Safe query execution with timeouts
      • agent.py β€” LLM-powered repair agent for error recovery
    • api/ β€” API routes (auth, databases, schema, query)
    • models/ β€” Database models and QueryPlan schema
    • services/ β€” Business services (database, cache)
    • security/ β€” JWT and encryption
  • frontend/ β€” React UI (Vite) with centralized API client
  • docker-compose.yml β€” Full stack orchestration
  • Dockerfile.backend, Dockerfile.frontend, nginx.conf β€” Container configs

πŸš€ Recent Improvements

Enhanced Query Planning

  • Few-Shot Examples: Added 8 comprehensive examples covering simple queries, JOINs, aggregations, subqueries, and complex WHERE conditions
  • Better Schema Context: Foreign key relationships are now emphasized with visual indicators and JOIN hints
  • Extended QueryPlan Model: Support for CTEs, subqueries, nested AND/OR filter groups

Intelligent Error Recovery

  • LLM-Powered Repair Agent: Analyzes execution errors and regenerates corrected query plans
  • Context-Aware: Uses original question, query plan, and schema to provide intelligent fixes
  • Fallback Support: Falls back to heuristic fixes if LLM is unavailable

Complex Query Support

  • CTEs (WITH clauses): Support for Common Table Expressions
  • Subqueries: Full subquery support in WHERE/HAVING clauses
  • Nested Filters: AND/OR filter groups with arbitrary nesting
  • Subquery Shortcuts: Special handling for common patterns (e.g., "above average")

πŸ“Š Query Plan Features

The QueryPlan model supports:

  • βœ… Basic SELECT with aggregations (COUNT, SUM, AVG, MIN, MAX, COUNT_DISTINCT)
  • βœ… JOINs (INNER, LEFT, RIGHT)
  • βœ… WHERE conditions with all operators
  • βœ… GROUP BY and HAVING
  • βœ… ORDER BY with aliases
  • βœ… CTEs (WITH clauses)
  • βœ… Subqueries in filters
  • βœ… Nested AND/OR filter groups
  • βœ… Date macros (TODAY, LAST_MONTH, etc.)
  • βœ… LIMIT and OFFSET

Prerequisites

  • Docker & Docker Compose (recommended) OR
  • Python 3.11+, Node 20+, npm/yarn for local development

Environment Variables

Create a .env file or export variables for production. Important variables used in docker-compose.yml:

  • GROQ_API_KEY β€” LLM / external API key (required)
  • JWT_SECRET_KEY β€” JWT signing secret
  • ENCRYPTION_KEY β€” Key used to encrypt database credentials
  • DATABASE_URL β€” URL for the app database (default used in compose: SQLite)
  • REDIS_URL β€” Redis connection (e.g. redis://redis:6379/0)
  • ENVIRONMENT, DEBUG, HOST, PORT β€” runtime settings

Local Development (Backend)

  1. Create and activate a venv:
python -m venv .venv
# Windows
.\.venv\Scripts\activate
# macOS / Linux
source .venv/bin/activate
  1. Install backend dependencies:
pip install -r backend/requirements.txt
  1. Run the API with auto-reload (for development):
cd backend
uvicorn main:app --reload --port 8000
  1. API docs are available at http://localhost:8000/docs.

Local Development (Frontend)

  1. Install node deps and run dev server:
cd frontend
npm ci
npm run dev
  1. Open the dev URL shown by Vite (usually http://localhost:5173). Configure VITE_API_URL in .env or package.json as needed.

Run the Full Stack with Docker (Recommended)

From the repo root, build and start all services:

# Build and run backend, frontend, and redis
docker compose up --build
  • Backend: http://localhost:8000
  • Frontend: http://localhost/ (port 80)
  • Redis: localhost:6379

To run just backend + redis:

docker compose up --build backend redis

To stop and remove containers:

docker compose down

Notes about Docker Setup

  • docker-compose.yml exposes services and injects env vars for production. There is an empty docker-compose.dev.yml placeholder that you can create for developer mounts (live code, hot-reload).
  • Backend image uses Gunicorn + Uvicorn workers in production mode (see Dockerfile.backend).
  • Frontend is built in a multi-stage image and served by Nginx (see Dockerfile.frontend and nginx.conf).

πŸ› Debugging & Troubleshooting

Logs

Use Docker logs or run services locally to get stack traces:

# Tail backend logs
docker compose logs -f backend

# Or run backend locally and watch stdout
uvicorn main:app --reload

Common Issues

  • Redis not connected: The app falls back to in-memory cache; check REDIS_URL and that the redis service is running.
  • Missing env vars: Backend will warn if critical keys are not set (JWT, ENCRYPTION_KEY, LLM key). Provide them in .env or the host environment.
  • Database connection issues: Use the databases endpoints in the UI to test a connection; server-side, check backend/services/database_service.py for host validation and connection errors.

Backend Debugging

  • Enable verbose SQL/engine logs by setting DEBUG=true in environment or by changing echo in db/session.py.
  • Recreate and inspect DB file if using SQLite (data/text_to_sql.db by default inside backend_data volume).
  • For step debugging in planning/execution, add temporary logging around:
    • core/planner.py β€” Query planning pipeline
    • core/llm.py β€” LLM prompt and response
    • core/sql_builder.py β€” SQL generation
    • core/executor.py β€” Query execution
    • core/agent.py β€” Error recovery flow

Understanding Query Plans

  • Check the query_plan field in API responses to see the structured plan
  • Review agent_flow in error responses to see repair attempts
  • Enable debug logging to see LLM prompts and responses

πŸ§ͺ Testing

  • There are no unit tests included in this repository by default. For safe changes, add tests around:
    • core/sql_builder.py β€” Ensure query plans map to safe SQL
    • core/validator.py β€” Validate query plans against schemas
    • core/agent.py β€” Test error recovery scenarios
    • core/llm.py β€” Verify few-shot examples produce correct plans

πŸ” Security Features

  1. Structured Query Plans β€” LLM outputs structured plans, not raw SQL
  2. Deterministic SQL Generation β€” Code builds SQL from validated plans
  3. Read-Only Queries β€” Only SELECT statements allowed
  4. Query Validation β€” Plans validated against actual schema
  5. Encrypted Credentials β€” Database passwords encrypted at rest
  6. Query Auditing β€” All queries logged with metadata
  7. Timeout Enforcement β€” Queries timeout after configured limit
  8. Row Limits β€” Results capped at MAX_QUERY_ROWS (1000)

πŸ“– How It Works

  1. You ask a question in plain English
  2. AI analyzes your question + database schema
  3. AI generates a structured query plan (NOT raw SQL)
  4. Plan is validated against your actual schema
  5. Safe SQL is generated from the validated plan
  6. Query is executed with timeouts and limits
  7. If execution fails, repair agent attempts intelligent fixes
  8. Results returned to you

Key insight: The LLM never writes SQL directly. It outputs structured intent, and deterministic code generates the SQL.

πŸš€ Quick Start

  1. POST /auth/register - Create account
  2. POST /auth/login - Get JWT token
  3. POST /databases - Connect your MySQL database
  4. GET /schema/{database_id} - View your schema
  5. POST /query/{database_id} - Ask questions!

Contributing

  • Fork the repo, create a feature branch, and open a PR.
  • Keep changes minimal and focused; add tests for any core logic changes.

License

Contact

  • For high-level questions about the architecture or help running the stack, open an issue or reach out to the maintainer.

About

An AI-powered application that converts natural language queries into SQL using an intelligent agentic workflow.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors