BurstDB

BurstDB is a state-of-the-art generative data synthesis platform designed for the modern, data-sovereign enterprise. It enables engineering teams to create high-fidelity, architecturally-aware synthetic databases for development, testing, and analytics without production data risk.

🏗️ Technical Architecture

BurstDB utilizes a distributed generation pipeline to ensure scalability and precision:

1. The Synthesis Protocol

BurstDB operates on a Constraint-Aware Synthesis model. Unlike simple random data generators, the platform utilizes a ConstraintGraph engine to parse and maintain referential integrity, unique constraints, and complex joint distributions across heterogeneous data sources.

2. Generative ML Layer

HMA (Hierarchical Multi-table Analysis): Specifically designed for deeply nested relational schemas. It captures recursive foreign key dependencies and preserves cross-table statistical correlations.
CTGAN (Conditional GAN): Utilized for high-entropy tabular data. It leverages conditional generators to handle imbalanced categorical columns and continuous data with non-Gaussian distributions.
Apache Arrow Integration: All data interchange between the ML workers and the output buffers utilizes Apache Arrow (Feather/Parquet) for zero-copy serialized throughput.

3. Distributed Orchestration

Queueing Engine: Redis-backed Celery clusters allow for horizontal scaling of synthesis nodes.
State Machine: Powered by FastAPI and SQLAlchemy, managing complex job lifecycles from initial schema ingestion to final ZIP packaging.

🚀 Quick Start

1. Infrastructure Requirements

BurstDB requires uv for high-performance Python orchestration and Node.js for the premium interface.

# Clone the infrastructure
git clone https://github.com/ghoshsoham71/BurstDB.git
cd BurstDB

2. Backend Orchestration

cd api
uv sync
# Start the API Gateway (REST Interface)
uv run uvicorn main:app --reload
# Start the Synthesis Worker (Separate Terminal)
uv run celery -A src.celery_app worker --loglevel=info -P solo

3. Frontend Interface

cd ui
npm install
npm run dev

🔒 Security & Privacy (Differential Privacy)

BurstDB is engineered for Data Sovereignty:

Mathematical Guarantees: Implements ε-differential privacy to ensure that synthetic outputs provide no statistical disclosure of individual production records.
Entity Identification (NER): Integrated PII discovery service that automatically masks and synthesizes sensitive markers during initial blueprinting.
Zero-Knowledge Architecture: All synthesis happens strictly within your infrastructure; no data ever leaves the VPC.

🌐 Project Links

Documentation
API Reference
Architecture Specs

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
.vscode		.vscode
api		api
ui		ui
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
run_local.ps1		run_local.ps1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BurstDB

🏗️ Technical Architecture

1. The Synthesis Protocol

2. Generative ML Layer

3. Distributed Orchestration

🚀 Quick Start

1. Infrastructure Requirements

2. Backend Orchestration

3. Frontend Interface

🔒 Security & Privacy (Differential Privacy)

🌐 Project Links

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

BurstDB

🏗️ Technical Architecture

1. The Synthesis Protocol

2. Generative ML Layer

3. Distributed Orchestration

🚀 Quick Start

1. Infrastructure Requirements

2. Backend Orchestration

3. Frontend Interface

🔒 Security & Privacy (Differential Privacy)

🌐 Project Links

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages