ML-Powered Approximate Query Engine

Fast, intelligent SQL query optimization using adaptive machine learning

A full-stack approximate query engine that achieves 10-100x speedups on analytical queries while maintaining configurable error bounds with statistical confidence intervals.

Overview

Traditional databases execute queries exactly, which becomes slow on large datasets. This engine intelligently applies approximation techniques:

Sampling: Run queries on a representative subset, scale results
Sketches: Use probabilistic data structures for distinct counts
Adaptive Learning: Learn from query history to improve strategy selection

Key Features

Feature	Description
Adaptive ML Optimizer	Multi-armed bandit strategy selection with real-time learning
Probabilistic Structures	HyperLogLog (distinct counts), Count-Min Sketch (frequencies)
Statistical Guarantees	Bootstrap confidence intervals, configurable error tolerance
Dual Execution Mode	Choose between exact and ML-optimized queries
Full-Stack App	React frontend + Go API backend
Docker Ready	One-command deployment with docker-compose

Architecture

┌──────────────────────────────────────────────────────────────────┐
│                     React Frontend                                │
│  SQL Editor │ Run Exact │ Run ML Optimized │ Error Visualization │
└──────────────────────────────┬───────────────────────────────────┘
                               │ REST API
┌──────────────────────────────┴───────────────────────────────────┐
│                       Go Backend                                  │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────────┐   │
│  │ ML Engine   │  │ Planner     │  │ Executor                │   │
│  │ • Learning  │  │ • Cost Model│  │ • Query Execution       │   │
│  │ • Strategy  │  │ • Plan Type │  │ • Result Scaling        │   │
│  │ • Features  │  │             │  │ • Bootstrap CI          │   │
│  └─────────────┘  └─────────────┘  └─────────────────────────┘   │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────────┐   │
│  │ Sampler     │  │ Sketches    │  │ Estimator               │   │
│  │ • Uniform   │  │ • HyperLogLog│ │ • Confidence Intervals │   │
│  │ • Stratified│  │ • Count-Min │  │ • Error Bounds          │   │
│  └─────────────┘  └─────────────┘  └─────────────────────────┘   │
└──────────────────────────────┬───────────────────────────────────┘
                               │
┌──────────────────────────────┴───────────────────────────────────┐
│                     SQLite Database (WAL Mode)                    │
│  Application Tables │ Sample Tables │ ML Learning History        │
└──────────────────────────────────────────────────────────────────┘

Quick Start

Option 1: Docker (Recommended)

docker-compose up --build

Frontend: http://localhost:5173
Backend API: http://localhost:8080

Option 2: Manual Setup

Backend:

cd cmd/aqe-server
go build -o aqe-server
./aqe-server

Frontend:

cd frontend
npm install
npm run dev

Usage

Web Interface

Open http://localhost:5173
Enter SQL query (e.g., SELECT COUNT(*) FROM purchases)
Click Run ML Optimized for fast approximate results
Click Run Exact for precise results

API

ML-Optimized Query:

curl -X POST http://localhost:8080/query \
  -H "Content-Type: application/json" \
  -d '{
    "sql": "SELECT COUNT(*) FROM purchases",
    "use_ml_optimization": true,
    "max_rel_error": 0.05
  }'

Check Learning Stats:

curl http://localhost:8080/ml/stats

Project Structure

├── cmd/
│   ├── aqe-server/          # Main server entry point
│   └── seed/                # Demo data generator
├── pkg/
│   ├── api/                 # REST API handlers
│   ├── ml/                  # ML optimization engine
│   │   ├── optimizer.go     # Base optimizer with strategy selection
│   │   ├── learning.go      # Adaptive learning system
│   │   ├── join_optimizer.go# JOIN query optimization
│   │   └── error_bounds.go  # Statistical error estimation
│   ├── planner/             # Query planning and cost model
│   ├── executor/            # Query execution and result scaling
│   ├── sampler/             # Uniform and stratified sampling
│   ├── sketches/            # HyperLogLog, Count-Min Sketch
│   ├── estimator/           # Bootstrap confidence intervals
│   └── storage/             # Database metadata
├── frontend/                # React + TypeScript UI
├── scripts/                 # Test and utility scripts
└── documentation/           # Detailed technical docs

How It Works

1. Feature Extraction

Analyze query structure: table size, aggregations, GROUP BY, WHERE complexity

2. Strategy Selection

Multi-armed bandit chooses optimal strategy based on:

Historical performance data
Query features
Error tolerance

3. Query Transformation

Modify SQL based on strategy:

Sample: SELECT * FROM table ORDER BY RANDOM() LIMIT k
Sketch: Use pre-built HyperLogLog for distinct counts

4. Result Scaling

Scale sample results: count × (1/sample_fraction)

5. Confidence Intervals

Bootstrap resampling provides statistical bounds

6. Learning Feedback

Record actual error vs predicted → improve future predictions

Performance

Query Type	Dataset	Speedup	Error	Strategy
COUNT(*)	200K rows	100x	~2%	Sample
SUM(amount)	200K rows	100x	~3%	Sample
COUNT(DISTINCT)	200K rows	50x	~3%	Sketch
GROUP BY	200K rows	80x	~5%	Stratified

Technologies

Backend: Go, SQLite, Gorilla Mux
Frontend: React, TypeScript, Vite, Recharts
DevOps: Docker, docker-compose

Documentation

Flow Diagram - Visual architecture overview
Architecture Documentation
Implementation Guide
ML Optimization Details

Authors

Built for E6Data Hackathon

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
cmd		cmd
debug		debug
frontend		frontend
pkg		pkg
scripts		scripts
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ML-Powered Approximate Query Engine

Overview

Key Features

Architecture

Quick Start

Option 1: Docker (Recommended)

Option 2: Manual Setup

Usage

Web Interface

API

Project Structure

How It Works

1. Feature Extraction

2. Strategy Selection

3. Query Transformation

4. Result Scaling

5. Confidence Intervals

6. Learning Feedback

Performance

Technologies

Documentation

Authors

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

SahithiKokkula/Hackathon-E6Data

Folders and files

Latest commit

History

Repository files navigation

ML-Powered Approximate Query Engine

Overview

Key Features

Architecture

Quick Start

Option 1: Docker (Recommended)

Option 2: Manual Setup

Usage

Web Interface

API

Project Structure

How It Works

1. Feature Extraction

2. Strategy Selection

3. Query Transformation

4. Result Scaling

5. Confidence Intervals

6. Learning Feedback

Performance

Technologies

Documentation

Authors

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages