Skip to content

SahithiKokkula/Hackathon-E6Data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ML-Powered Approximate Query Engine

Fast, intelligent SQL query optimization using adaptive machine learning

A full-stack approximate query engine that achieves 10-100x speedups on analytical queries while maintaining configurable error bounds with statistical confidence intervals.

Overview

Traditional databases execute queries exactly, which becomes slow on large datasets. This engine intelligently applies approximation techniques:

  • Sampling: Run queries on a representative subset, scale results
  • Sketches: Use probabilistic data structures for distinct counts
  • Adaptive Learning: Learn from query history to improve strategy selection

Key Features

Feature Description
Adaptive ML Optimizer Multi-armed bandit strategy selection with real-time learning
Probabilistic Structures HyperLogLog (distinct counts), Count-Min Sketch (frequencies)
Statistical Guarantees Bootstrap confidence intervals, configurable error tolerance
Dual Execution Mode Choose between exact and ML-optimized queries
Full-Stack App React frontend + Go API backend
Docker Ready One-command deployment with docker-compose

Architecture

┌──────────────────────────────────────────────────────────────────┐
│                     React Frontend                                │
│  SQL Editor │ Run Exact │ Run ML Optimized │ Error Visualization │
└──────────────────────────────┬───────────────────────────────────┘
                               │ REST API
┌──────────────────────────────┴───────────────────────────────────┐
│                       Go Backend                                  │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────────┐   │
│  │ ML Engine   │  │ Planner     │  │ Executor                │   │
│  │ • Learning  │  │ • Cost Model│  │ • Query Execution       │   │
│  │ • Strategy  │  │ • Plan Type │  │ • Result Scaling        │   │
│  │ • Features  │  │             │  │ • Bootstrap CI          │   │
│  └─────────────┘  └─────────────┘  └─────────────────────────┘   │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────────┐   │
│  │ Sampler     │  │ Sketches    │  │ Estimator               │   │
│  │ • Uniform   │  │ • HyperLogLog│ │ • Confidence Intervals │   │
│  │ • Stratified│  │ • Count-Min │  │ • Error Bounds          │   │
│  └─────────────┘  └─────────────┘  └─────────────────────────┘   │
└──────────────────────────────┬───────────────────────────────────┘
                               │
┌──────────────────────────────┴───────────────────────────────────┐
│                     SQLite Database (WAL Mode)                    │
│  Application Tables │ Sample Tables │ ML Learning History        │
└──────────────────────────────────────────────────────────────────┘

Quick Start

Option 1: Docker (Recommended)

docker-compose up --build

Option 2: Manual Setup

Backend:

cd cmd/aqe-server
go build -o aqe-server
./aqe-server

Frontend:

cd frontend
npm install
npm run dev

Usage

Web Interface

  1. Open http://localhost:5173
  2. Enter SQL query (e.g., SELECT COUNT(*) FROM purchases)
  3. Click Run ML Optimized for fast approximate results
  4. Click Run Exact for precise results

API

ML-Optimized Query:

curl -X POST http://localhost:8080/query \
  -H "Content-Type: application/json" \
  -d '{
    "sql": "SELECT COUNT(*) FROM purchases",
    "use_ml_optimization": true,
    "max_rel_error": 0.05
  }'

Check Learning Stats:

curl http://localhost:8080/ml/stats

Project Structure

├── cmd/
│   ├── aqe-server/          # Main server entry point
│   └── seed/                # Demo data generator
├── pkg/
│   ├── api/                 # REST API handlers
│   ├── ml/                  # ML optimization engine
│   │   ├── optimizer.go     # Base optimizer with strategy selection
│   │   ├── learning.go      # Adaptive learning system
│   │   ├── join_optimizer.go# JOIN query optimization
│   │   └── error_bounds.go  # Statistical error estimation
│   ├── planner/             # Query planning and cost model
│   ├── executor/            # Query execution and result scaling
│   ├── sampler/             # Uniform and stratified sampling
│   ├── sketches/            # HyperLogLog, Count-Min Sketch
│   ├── estimator/           # Bootstrap confidence intervals
│   └── storage/             # Database metadata
├── frontend/                # React + TypeScript UI
├── scripts/                 # Test and utility scripts
└── documentation/           # Detailed technical docs

How It Works

1. Feature Extraction

Analyze query structure: table size, aggregations, GROUP BY, WHERE complexity

2. Strategy Selection

Multi-armed bandit chooses optimal strategy based on:

  • Historical performance data
  • Query features
  • Error tolerance

3. Query Transformation

Modify SQL based on strategy:

  • Sample: SELECT * FROM table ORDER BY RANDOM() LIMIT k
  • Sketch: Use pre-built HyperLogLog for distinct counts

4. Result Scaling

Scale sample results: count × (1/sample_fraction)

5. Confidence Intervals

Bootstrap resampling provides statistical bounds

6. Learning Feedback

Record actual error vs predicted → improve future predictions

Performance

Query Type Dataset Speedup Error Strategy
COUNT(*) 200K rows 100x ~2% Sample
SUM(amount) 200K rows 100x ~3% Sample
COUNT(DISTINCT) 200K rows 50x ~3% Sketch
GROUP BY 200K rows 80x ~5% Stratified

Technologies

Backend: Go, SQLite, Gorilla Mux
Frontend: React, TypeScript, Vite, Recharts
DevOps: Docker, docker-compose

Documentation

Authors


Built for E6Data Hackathon

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •