Fast, intelligent SQL query optimization using adaptive machine learning
A full-stack approximate query engine that achieves 10-100x speedups on analytical queries while maintaining configurable error bounds with statistical confidence intervals.
Traditional databases execute queries exactly, which becomes slow on large datasets. This engine intelligently applies approximation techniques:
- Sampling: Run queries on a representative subset, scale results
- Sketches: Use probabilistic data structures for distinct counts
- Adaptive Learning: Learn from query history to improve strategy selection
| Feature | Description |
|---|---|
| Adaptive ML Optimizer | Multi-armed bandit strategy selection with real-time learning |
| Probabilistic Structures | HyperLogLog (distinct counts), Count-Min Sketch (frequencies) |
| Statistical Guarantees | Bootstrap confidence intervals, configurable error tolerance |
| Dual Execution Mode | Choose between exact and ML-optimized queries |
| Full-Stack App | React frontend + Go API backend |
| Docker Ready | One-command deployment with docker-compose |
┌──────────────────────────────────────────────────────────────────┐
│ React Frontend │
│ SQL Editor │ Run Exact │ Run ML Optimized │ Error Visualization │
└──────────────────────────────┬───────────────────────────────────┘
│ REST API
┌──────────────────────────────┴───────────────────────────────────┐
│ Go Backend │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────┐ │
│ │ ML Engine │ │ Planner │ │ Executor │ │
│ │ • Learning │ │ • Cost Model│ │ • Query Execution │ │
│ │ • Strategy │ │ • Plan Type │ │ • Result Scaling │ │
│ │ • Features │ │ │ │ • Bootstrap CI │ │
│ └─────────────┘ └─────────────┘ └─────────────────────────┘ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────┐ │
│ │ Sampler │ │ Sketches │ │ Estimator │ │
│ │ • Uniform │ │ • HyperLogLog│ │ • Confidence Intervals │ │
│ │ • Stratified│ │ • Count-Min │ │ • Error Bounds │ │
│ └─────────────┘ └─────────────┘ └─────────────────────────┘ │
└──────────────────────────────┬───────────────────────────────────┘
│
┌──────────────────────────────┴───────────────────────────────────┐
│ SQLite Database (WAL Mode) │
│ Application Tables │ Sample Tables │ ML Learning History │
└──────────────────────────────────────────────────────────────────┘
docker-compose up --build- Frontend: http://localhost:5173
- Backend API: http://localhost:8080
Backend:
cd cmd/aqe-server
go build -o aqe-server
./aqe-serverFrontend:
cd frontend
npm install
npm run dev- Open http://localhost:5173
- Enter SQL query (e.g.,
SELECT COUNT(*) FROM purchases) - Click Run ML Optimized for fast approximate results
- Click Run Exact for precise results
ML-Optimized Query:
curl -X POST http://localhost:8080/query \
-H "Content-Type: application/json" \
-d '{
"sql": "SELECT COUNT(*) FROM purchases",
"use_ml_optimization": true,
"max_rel_error": 0.05
}'Check Learning Stats:
curl http://localhost:8080/ml/stats├── cmd/
│ ├── aqe-server/ # Main server entry point
│ └── seed/ # Demo data generator
├── pkg/
│ ├── api/ # REST API handlers
│ ├── ml/ # ML optimization engine
│ │ ├── optimizer.go # Base optimizer with strategy selection
│ │ ├── learning.go # Adaptive learning system
│ │ ├── join_optimizer.go# JOIN query optimization
│ │ └── error_bounds.go # Statistical error estimation
│ ├── planner/ # Query planning and cost model
│ ├── executor/ # Query execution and result scaling
│ ├── sampler/ # Uniform and stratified sampling
│ ├── sketches/ # HyperLogLog, Count-Min Sketch
│ ├── estimator/ # Bootstrap confidence intervals
│ └── storage/ # Database metadata
├── frontend/ # React + TypeScript UI
├── scripts/ # Test and utility scripts
└── documentation/ # Detailed technical docs
Analyze query structure: table size, aggregations, GROUP BY, WHERE complexity
Multi-armed bandit chooses optimal strategy based on:
- Historical performance data
- Query features
- Error tolerance
Modify SQL based on strategy:
- Sample:
SELECT * FROM table ORDER BY RANDOM() LIMIT k - Sketch: Use pre-built HyperLogLog for distinct counts
Scale sample results: count × (1/sample_fraction)
Bootstrap resampling provides statistical bounds
Record actual error vs predicted → improve future predictions
| Query Type | Dataset | Speedup | Error | Strategy |
|---|---|---|---|---|
| COUNT(*) | 200K rows | 100x | ~2% | Sample |
| SUM(amount) | 200K rows | 100x | ~3% | Sample |
| COUNT(DISTINCT) | 200K rows | 50x | ~3% | Sketch |
| GROUP BY | 200K rows | 80x | ~5% | Stratified |
Backend: Go, SQLite, Gorilla Mux
Frontend: React, TypeScript, Vite, Recharts
DevOps: Docker, docker-compose
- Flow Diagram - Visual architecture overview
- Architecture Documentation
- Implementation Guide
- ML Optimization Details
Built for E6Data Hackathon