Skip to content

CodaCipher/iterabeast

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Typing SVG IteraBeast Main Demo


Synth DataGen Engine
High-performance, UI-driven synthetic data generation for AI fine-tuning.

Status Backend Frontend Output


📡 OVERVIEW

IteraBeast is a modern, cyber-aesthetic web application designed to simplify and accelerate the process of generating large-scale synthetic datasets (.jsonl) for training and fine-tuning LLMs.

Engineered for AI Researchers, Data Scientists, and LLM Fine-tuners, this tool bridges the gap between raw prompt engineering and production-grade dataset creation. It transforms the chaotic task of data synthesis into a streamlined, visually immersive operation—perfect for building RAG pipelines, fine-tuning adapters (LoRA/QLoRA), or generating evaluation benchmarks.

By leveraging a dual-node architecture (FastAPI backend + React frontend), it allows developers to batch-generate diverse, context-aware conversational data across multiple LLM providers simultaneously.


⚡ CORE CAPABILITIES

🧬 Multi-Provider Node Matrix

Seamlessly integrates local (Ollama) and cloud nodes (Groq, OpenRouter, DeepInfra) into a unified generation grid. Switch providers instantly without breaking the workflow.

🔄 Advanced Distribution Routing

Features intelligent workload balancing algorithms including Sequential, Round-Robin, and Hybrid strategies to maximize throughput and minimize API rate limits.

🛡️ Strict JSONL & Schema Enforcement

Implements a rigorous Post-Generation Validation Layer that guarantees 100% valid JSONL syntax. The engine automatically sanitizes output and escapes forbidden characters, ensuring zero-fail ingestion for training pipelines.

🗃️ Direct Stream Architecture

Bypasses memory bottlenecks by streaming generated .jsonl chunks directly to your local SSD via the FileSystem Access API. Capable of handling massive datasets with zero latency.

🧠 Semantic Variation Injection

Prevents dataset overfitting by using MiniLM embeddings to analyze and inject dynamic context. The system autonomously alters sentence structures to ensure high-entropy, semantically diverse data distribution.

🎨 UNSTABLE_CORE Interface

A reactive, hardware-accelerated UI with real-time cost/token telemetry and interchangeable themes (MAGI / UNSTABLE_CORE), designed for high-velocity data operations.


IteraBeast Feature 1

Multi-Provider Integration & Node Configuration
IteraBeast Feature 3

Semantic Variation System & Distribution Routing

🛠️ QUICK START

1. Backend Service (FastAPI)

cd backend
python -m venv .venv
# Activate virtual environment:
.venv\Scripts\activate      # Windows (Command Prompt)
# .\.venv\Scripts\Activate.ps1 # Windows (PowerShell)
# source .venv/bin/activate    # Linux/Mac
pip install -r requirements.txt
python main.py

API runs on http://localhost:8000

2. Frontend Client (React)

cd frontend
npm install
npm run dev

Interface accessible at http://localhost:5173


📁 ARCHITECTURE

IteraBeast/
├── backend/                  # Async Server Node
│   ├── main.py               # API Endpoints & Generators
│   └── requirements.txt      # Dependencies
├── frontend/                 # Client UI Node
│   ├── src/
│   │   ├── components/       # Interface Elements & Terminal
│   │   ├── App.jsx           # State & Execution Logic
│   │   └── index.css         # Styling & Animations
│   └── package.json
└── README.md

⚙️ REQUIREMENTS

  • Python 3.9+
  • Node.js 18+
  • Chromium-based browser (Chrome/Edge) recommended for full FileSystemWritableFileStream support.


[ SYSTEM_STATUS: OPERATIONAL ]  |  [ CAPACITY: OPTIMAL ]

CodaCipher

END_OF_LINE_SEQUENCE

About

IteraBeast is a synthetic data generation engine designed to simplify and accelerate the process of generating large-scale synthetic datasets for training and fine-tuning LLMs.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors