TuneForge

Benchmark-first fine-tuning framework for local LLMs on hardware you own.

Overview

TuneForge is an open-source, audit-ready engineering framework for QLoRA fine-tuning, benchmarking, and governed model publishing. Built on the foundation of karpathy/autoresearch, it provides a complete pipeline from data preparation through training, evaluation, and export to Hugging Face and Ollama.

Designed for teams that run models on their own hardware — with full provenance tracking, reproducible benchmarks, and EU regulatory awareness built in from the start.

Current status: Technical Preview. Benchmark claims are scoped to documented hardware budgets. This is not legal advice and does not guarantee regulatory compliance.

Architecture

flowchart LR
    subgraph Input
        A[Training Data<br/>JSONL / Alpaca / ShareGPT]
        B[Base Model<br/>HuggingFace Hub]
    end

    subgraph Pipeline["TuneForge Pipeline"]
        direction LR
        C[Data Preparation<br/>Format Normalization<br/>Train/Eval Split]
        D[QLoRA Fine-Tuning<br/>PEFT + TRL / Unsloth<br/>4-bit Quantization]
        E[Evaluation<br/>Loss · Perplexity<br/>VRAM Tracking]
        F[Export & Publish<br/>LoRA Adapter<br/>GGUF · Modelfile]
    end

    subgraph Output
        G[HuggingFace Hub<br/>Model Card · Manifest]
        H[Ollama<br/>GGUF · Modelfile]
        I[Release Bundle<br/>Benchmarks · Attestation<br/>License Manifest]
    end

    A --> C
    B --> D
    C --> D
    D --> E
    E --> F
    F --> G
    F --> H
    F --> I

flowchart TB
    subgraph Agent["Agent Loop (autoresearch)"]
        J[LLM Provider<br/>Claude · OpenAI · Ollama<br/>OpenRouter · Kimi]
        K[Experiment Runner<br/>Propose → Train → Evaluate<br/>Keep or Discard]
    end

    subgraph Core["Core Runtime"]
        L[QLoRATrainer<br/>peft_trl / unsloth backend]
        M[Model Publisher<br/>Bundle · HF · GGUF]
        N[Validation Registry<br/>Hardware Tiers · Attestation]
    end

    J --> K
    K --> L
    L --> M
    L --> N

Features

Dual Backend — Switch between transformers + peft + trl and unsloth via config. Same interface, same metrics.
Hardware-Tiered Configs — Pre-tuned configurations for 8 GB, 12 GB, and 24 GB+ GPUs. No guesswork.
Autonomous Agent Loop — Provider-agnostic research loop (Claude, OpenAI, Ollama, OpenRouter) that proposes, trains, and evaluates automatically.
Governed Release Bundles — Every export includes model card, training manifest, benchmark summary, license manifest, environment snapshot, and tester attestation.
GGUF + Ollama Export — Convert adapters to GGUF and generate Modelfiles for local deployment.
Audit Trail — VRAM tracking, reproducible seeds, git SHA provenance, structured logging.
Bilingual Documentation — Full EN/DE documentation, governance templates, and compliance packs.

Quick Start

Local Setup

git clone https://github.com/AI-Engineerings-at/tuneforge.git
cd tuneforge

python -m venv .venv
source .venv/bin/activate        # Linux/macOS
# .venv\Scripts\activate          # Windows

pip install --upgrade pip
pip install -e ".[llm,finetune,dev]"

# Run tests
python -m pytest -q tests

Docker (NVIDIA GPU required)

# Fine-tuning pipeline
AUTORESEARCH_DOMAIN=sps-plc docker compose -f docker-compose.finetune.yml up --build

Run Fine-Tuning

# QLoRA training with YAML config
python -m finetune.trainer --config finetune/configs/your-domain.yaml --eval

# Agent loop (autonomous research)
python agent_loop.py --provider ollama --model qwen2.5-coder:7b

Canonical Docker Image

ghcr.io/ai-engineerings-at/tuneforge-studio:<semver>

Images are built and published by GitHub Actions. Never committed to git.

Configuration

Environment Variables

Variable	Description	Required
`ANTHROPIC_API_KEY`	API key for Claude provider	For Claude agent loop
`OPENROUTER_API_KEY`	API key for OpenRouter	For OpenRouter agent loop
`HF_TOKEN`	Hugging Face access token	For model publishing
`AUTORESEARCH_DOMAIN`	Target domain for training	For Docker pipeline
`NVIDIA_VISIBLE_DEVICES`	GPU device selection	Docker only

QLoRA Training Config (YAML)

Parameter	Default	Description
`base_model`	`Qwen/Qwen2.5-Coder-7B-Instruct`	HuggingFace model ID
`backend`	`peft_trl`	Training backend (`peft_trl` or `unsloth`)
`dataset_format`	`alpaca`	Input format (`alpaca`, `sharegpt`, etc.)
`bits`	`4`	Quantization bits (4-bit QLoRA)
`lora_r`	`16`	LoRA rank
`lora_alpha`	`32`	LoRA alpha scaling
`learning_rate`	`2e-4`	Training learning rate
`max_steps`	`1000`	Maximum training steps
`max_seq_length`	`2048`	Maximum sequence length
`per_device_train_batch_size`	`4`	Batch size per GPU
`gradient_accumulation_steps`	`4`	Gradient accumulation
`primary_metric`	`eval_loss`	Metric to optimize

Supported Models and GPU Tiers

GPU Tier Configs

Tier	VRAM	Dataset	Model Size	Seq Length	Batch Size	Use Case
Tier 1	6-8 GB	TinyStories	384d / 3L	256	16	Quick experiments, validation
Tier 2	10-12 GB	ClimbMix	512d / 5L	512	32	Mid-range training
Tier 3	16-24 GB	ClimbMix	768d / 8L	2048	8	Full training runs

QLoRA Base Models

Model	Parameters	Min VRAM (4-bit)	Status
Qwen2.5-Coder-7B-Instruct	7B	~8 GB	Default
Any HuggingFace CausalLM	Varies	Varies	Supported via config

Hardware Validation Tiers

Tier	Hardware	Status
Tier A	RTX 3090 (24 GB)	Validation target
Tier B	A100 / H100 / 48 GB+	Validation target
Unassigned	Other GPUs	Technical Preview

Project Structure

tuneforge/
├── train.py                  # autoresearch training loop
├── agent_loop.py             # Autonomous LLM agent for research
├── agent_config.py           # Agent configuration
├── providers.py              # LLM provider abstraction
├── finetune/
│   ├── trainer.py            # QLoRA training runtime
│   └── model_publisher.py    # Release bundle & HF publishing
├── datasets/
│   ├── data_formats.py       # Format normalization
│   └── synthetic_generator.py # Synthetic data generation
├── configs/                  # GPU tier configurations (JSON)
├── validation/               # Validation registry & runbooks
├── scripts/                  # CI checks & release validation
├── docs/                     # Architecture, SOPs, compliance
├── templates/                # Model card & governance templates
├── docker-compose.finetune.yml
├── Dockerfile.finetune
└── pyproject.toml

CI/CD

GitHub Actions pipelines:

Workflow	Purpose
`tuneforge-ci.yml`	Quality gates, repo hygiene, doc parity checks
`tuneforge-release.yml`	Docker image build and preview releases
`tuneforge-model-publish.yml`	Model bundle packaging and HuggingFace publishing

Release automation attaches SBOMs, checksums, validation registry snapshots, and release metadata. Secrets are stored in GitHub Secrets or an external vault — never in the repository.

Compliance

TuneForge is designed with EU regulatory awareness:

EU AI Act — Documentation structure supports Article 11 (Technical Documentation) and Article 13 (Transparency) requirements for engineering review and governance preparation.
DSGVO / GDPR — Training data provenance tracking, no personal data in default pipelines, privacy notes in model cards.
Audit Readiness — Structured logging, reproducible training runs, hardware attestation, and validation registry.

This is engineering preparation, not legal certification. Consult qualified legal counsel for compliance obligations. See COMPLIANCE_STATEMENT.md for details.

Attribution

Built with and on top of:

karpathy/autoresearch — Research loop foundation
transformers — Model loading and tokenization
peft — Parameter-efficient fine-tuning
trl — SFT training
unsloth — Optimized training backend
llama.cpp — GGUF conversion
Ollama — Local model deployment

Full attribution: THIRD_PARTY.md | FORK.md | docs/CREDITS.md

Contributing

Contributions are welcome. Please read CONTRIBUTING.md before submitting a pull request.

Security issues: SECURITY.md
Support: SUPPORT.md
Changelog: CHANGELOG.md

License

MIT License. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.bg-shell		.bg-shell
.github/workflows		.github/workflows
.gsd/audits		.gsd/audits
.mcp-debug-tools		.mcp-debug-tools
archive		archive
audit		audit
configs		configs
content		content
dashboard		dashboard
data_utils		data_utils
docs		docs
eval		eval
finetune		finetune
patches		patches
programs		programs
scripts		scripts
templates		templates
tests		tests
upstream		upstream
validation		validation
.coverage		.coverage
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
AGENTS.md		AGENTS.md
AUDIT_TUNEFORGE_ZEROTH.md		AUDIT_TUNEFORGE_ZEROTH.md
CHANGELOG-DE.md		CHANGELOG-DE.md
CHANGELOG.md		CHANGELOG.md
COMPLIANCE_STATEMENT-DE.md		COMPLIANCE_STATEMENT-DE.md
COMPLIANCE_STATEMENT.md		COMPLIANCE_STATEMENT.md
CONTRIBUTING-DE.md		CONTRIBUTING-DE.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
Dockerfile.finetune		Dockerfile.finetune
Dockerfile.turboquant		Dockerfile.turboquant
FORK.md		FORK.md
LICENSE		LICENSE
PLAN_v1.0.0_RELEASE.md		PLAN_v1.0.0_RELEASE.md
QUALITY_FIRST_ROADMAP.md		QUALITY_FIRST_ROADMAP.md
README-DE.md		README-DE.md
README.md		README.md
RELEASE_CHECKLIST.md		RELEASE_CHECKLIST.md
REPO_INDEX-DE.md		REPO_INDEX-DE.md
REPO_INDEX.md		REPO_INDEX.md
SECURITY-DE.md		SECURITY-DE.md
SECURITY.md		SECURITY.md
SUPPORT-DE.md		SUPPORT-DE.md
SUPPORT.md		SUPPORT.md
THIRD_PARTY.md		THIRD_PARTY.md
agent_config.py		agent_config.py
agent_loop.py		agent_loop.py
audit_run.py		audit_run.py
coverage.json		coverage.json
docker-compose.finetune.yml		docker-compose.finetune.yml
docker-compose.turboquant.yml		docker-compose.turboquant.yml
docker-compose.yml		docker-compose.yml
entrypoint-finetune.sh		entrypoint-finetune.sh
entrypoint.sh		entrypoint.sh
mkdocs.yml		mkdocs.yml
providers.py		providers.py
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements-finetune.txt		requirements-finetune.txt
run_patched.py		run_patched.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TuneForge

Table of Contents

Overview

Architecture

Features

Quick Start

Local Setup

Docker (NVIDIA GPU required)

Run Fine-Tuning

Canonical Docker Image

Configuration

Environment Variables

QLoRA Training Config (YAML)

Supported Models and GPU Tiers

GPU Tier Configs

QLoRA Base Models

Hardware Validation Tiers

Project Structure

CI/CD

Compliance

Attribution

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TuneForge

Table of Contents

Overview

Architecture

Features

Quick Start

Local Setup

Docker (NVIDIA GPU required)

Run Fine-Tuning

Canonical Docker Image

Configuration

Environment Variables

QLoRA Training Config (YAML)

Supported Models and GPU Tiers

GPU Tier Configs

QLoRA Base Models

Hardware Validation Tiers

Project Structure

CI/CD

Compliance

Attribution

Contributing

License

About

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages