LLM-driven SQL optimization and analysis tool for detecting performance issues, suggesting optimizations, and explaining query behavior.
SQL Auditor provides an intuitive web interface for analyzing SQL queries, viewing detected issues, and receiving optimization recommendations.
- Static Analysis: Detects 10+ common SQL anti-patterns (SELECT *, unused joins, cartesian products, non-SARGable predicates, etc.)
- LLM Integration: Uses OpenAI GPT-4 to generate natural language explanations and optimized query rewrites
- Index Advisor: Recommends indexes based on query patterns (WHERE, JOIN, ORDER BY, GROUP BY)
- Cost Estimation: Heuristic-based cost scoring and improvement estimates
- Web UI: Clean React + TypeScript interface for interactive analysis
- Jupyter Notebooks: Prototyping and experimentation environment
- Docker Support: One-command deployment with docker-compose
┌─────────────┐
│ Frontend │ React + TypeScript + Tailwind
│ (Port 5173)│
└──────┬──────┘
│ HTTP
▼
┌─────────────┐
│ FastAPI │ REST API
│ (Port 8000)│
└──────┬──────┘
│
├──► Parser (SQLGlot)
├──► Rules Engine (10 rules)
├──► Cost Estimator
├──► Index Advisor
└──► LLM Provider (OpenAI)
- Python 3.11+
- Node.js 18+ (for frontend)
- Poetry (recommended) or pip
- SQLite3 (for demo database)
- OpenAI API key (optional, for LLM features)
-
Clone the repository
git clone <repo-url> cd llm-sql-auditor
-
Set up environment
cp .env.example .env # Edit .env and add your OPENAI_API_KEY if desired -
Install Python dependencies
poetry install # OR pip install -r requirements.txt -
Seed demo database
./scripts/seed_demo.sh
-
Start development servers
./scripts/dev_up.sh
Or manually:
# Terminal 1: Backend poetry run uvicorn backend.app:app --reload --port 8000 # Terminal 2: Frontend cd frontend && npm install && npm run dev
-
Access the application
- Frontend: Available on the configured port (default: 5173)
- API: Available on the configured port (default: 8000)
- API Docs: Available at
/docsendpoint
docker-compose up --build- Open the frontend application in your browser
- Paste your schema DDL in the "Schema" textarea
- Enter SQL queries (separate multiple queries with
---) - Select SQL dialect (PostgreSQL or SQLite)
- Click "Analyze Queries"
- Review issues, rewrites, index suggestions, and LLM explanations
curl -X POST <API_URL>/api/audit \
-H "Content-Type: application/json" \
-d '{
"schema": "CREATE TABLE users (id INTEGER, email TEXT);",
"queries": [
"SELECT * FROM users;",
"SELECT * FROM users WHERE LOWER(email) = '\''test@example.com'\'';"
],
"dialect": "postgres"
}'curl -X POST <API_URL>/api/explain \
-H "Content-Type: application/json" \
-d '{
"schema": "CREATE TABLE users (id INTEGER, email TEXT);",
"query": "SELECT * FROM users WHERE LOWER(email) = '\''test@example.com'\'';",
"dialect": "postgres"
}'cd notebooks
jupyter notebook 01_prototype.ipynbThe rules engine detects the following issues:
| Code | Rule | Severity | Description | Example |
|---|---|---|---|---|
| R001 | SELECT_STAR | warn | Avoid SELECT * in production | SELECT * FROM users; |
| R002 | UNUSED_JOIN | warn | Joins where columns aren't referenced | SELECT u.id FROM users u JOIN orders o ON ... |
| R003 | CARTESIAN_JOIN | error | Joins without ON predicate | SELECT * FROM users, orders; |
| R004 | NON_SARGABLE | warn | Functions on indexed columns in WHERE | WHERE LOWER(email) = 'test' |
| R005 | MISSING_PREDICATE | warn | Large table scans without WHERE | SELECT * FROM orders; (100k+ rows) |
| R006 | ORDER_BY_NO_INDEX | info | ORDER BY columns lacking index | ORDER BY created_at (no index) |
| R007 | DISTINCT_MISUSE | info | DISTINCT as de-dupe band-aid | SELECT DISTINCT ... with joins |
| R008 | N_PLUS_ONE_PATTERN | warn | Correlated subqueries | WHERE EXISTS (SELECT ... WHERE t.id = outer.id) |
| R009 | LIKE_PREFIX_WILDCARD | warn | LIKE with leading wildcard | WHERE name LIKE '%value' |
| R010 | AGG_NO_GROUPING_INDEX | info | Aggregations missing covering index | GROUP BY category (no index) |
Original Query:
SELECT * FROM orders o
JOIN users u ON u.id = o.user_id
WHERE LOWER(u.email) = 'admin@example.com'
ORDER BY o.created_at DESC;Detected Issues:
- R001: SELECT * usage
- R004: Non-SARGable predicate (LOWER function)
- R006: ORDER BY without supporting index
Optimized Query:
SELECT o.id, o.user_id, o.created_at, o.total_cents,
u.id, u.email
FROM orders o
JOIN users u ON u.id = o.user_id
WHERE u.email = 'admin@example.com' -- Removed LOWER()
ORDER BY o.created_at DESC;Recommended Index:
CREATE INDEX idx_orders_user_created ON orders(user_id, created_at);
CREATE INDEX idx_users_email ON users(email);{
"summary": {
"totalIssues": 5,
"highSeverity": 1,
"estImprovement": "2-4x potential speedup - Issues: Full scan on large table 'orders' (100000 rows), Non-SARGable function in WHERE clause"
},
"issues": [
{
"code": "R001",
"severity": "warn",
"message": "Avoid SELECT * in production queries...",
"rule": "SELECT_STAR"
}
],
"rewrites": [
{
"original": "SELECT * FROM orders...",
"optimized": "SELECT o.id, o.user_id FROM orders o...",
"rationale": "Project only needed columns..."
}
],
"indexes": [
{
"table": "orders",
"columns": ["user_id", "created_at"],
"type": "btree",
"rationale": "Supports ORDER BY and join"
}
],
"llmExplain": "Your query scans entire orders due to SELECT *..."
}# Run all tests
pytest
# Run with coverage
pytest --cov=backend --cov-report=html
# Run specific test file
pytest backend/tests/test_rules.pysql-auditor/
├── backend/
│ ├── app.py # FastAPI entry point
│ ├── core/
│ │ ├── config.py # Configuration
│ │ ├── models.py # Pydantic models
│ │ └── dialects.py # SQL dialect handling
│ ├── services/
│ │ ├── analyzer/
│ │ │ ├── parser.py # SQL parsing
│ │ │ ├── rules_engine.py # 10 detection rules
│ │ │ ├── cost_estimator.py
│ │ │ └── index_advisor.py
│ │ ├── llm/
│ │ │ ├── provider.py # OpenAI integration
│ │ │ └── prompts.py # LLM prompts
│ │ └── pipeline.py # Main orchestration
│ ├── db/
│ │ ├── seed.sql # Demo schema
│ │ └── explain_helpers.py
│ └── tests/ # Test suite
├── frontend/
│ ├── src/
│ │ ├── App.tsx
│ │ ├── api.ts
│ │ └── components/
│ └── package.json
├── notebooks/
│ └── 01_prototype.ipynb
├── scripts/
│ ├── dev_up.sh
│ └── seed_demo.sh
├── docker-compose.yml
├── Dockerfile
└── README.md
Environment variables (see .env.example):
OPENAI_API_KEY: OpenAI API key for LLM features (optional)SQLAUDITOR_LOG_LEVEL: Logging level (default: INFO)SQLAUDITOR_DEMO_DB: Path to demo SQLite databaseDEFAULT_DIALECT: Default SQL dialect (postgres/sqlite)
- Heuristic-based: Cost estimation and index recommendations are heuristic, not based on actual EXPLAIN plans
- Static analysis only: Does not execute queries or connect to live databases (unless using optional PG connection)
- Limited dialect support: Currently supports PostgreSQL and SQLite
- LLM dependency: Advanced explanations require OpenAI API key
- Real EXPLAIN integration with live database connections
- Multi-dialect cost models (MySQL, SQL Server, etc.)
- Query library for storing and re-running analyses
- Export reports as JSON/Markdown from UI
- Batch processing for large query sets
- CI/CD integration hooks
Contributions welcome! Please:
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Ensure all tests pass and coverage ≥ 80%
- Submit a pull request
MIT License - see LICENSE file for details.
- Built with FastAPI
- SQL parsing via SQLGlot
- LLM integration via OpenAI API
