Omniflow (Early Preview)

A minimal, composable pipeline for multimodal data prep with single-responsibility steps and predictable chaining.

Transform raw, unstructured data into training-ready datasets.

Domain-first steps: MeshTransform, MeshClean, MeshQC, ImageTransform, ImageClean, ImageQC, ImageLabel
Clear I/O contracts via StepSignature: input modality, output modality, arity
Pluggable providers (e.g., Blender) behind small interfaces
Built-in AI labeling with DSPy for caption/prompt generation

This omniflow/ folder is the intended public OSS surface. Scripts under scripts/ are convenience runners for local development.

Core concepts

Modality: current content flowing through the pipeline
- 3d_mesh, 2d_image, text, metrics (extensible)
Arity: how many inputs/outputs a step produces
- one_to_one, one_to_many, many_to_one
Step types
- Transform: mutates content (e.g., render, convert mode)
- Clean: drops/fixes content based on rules
- QC: computes metrics/flags, does not mutate content
- Label: enriches metadata with captions/tags/labels (content unchanged)

Each step declares a StepSignature(input_modality, output_modality, arity) and is validated during Pipeline.compile().

Execution model

Streaming by default: steps consume and yield Sample objects lazily.
Optional batching (default batch_size=1): Pipeline.run(batch_size=N) processes up to N items end-to-end, then releases them before loading the next batch. This bounds memory.
Optional parallelism: Pipeline.run(num_workers=W) runs up to W batches concurrently end-to-end (threads; portable, good for I/O-bound). In-flight items ≈ N * W.
Fanout: one-to-many steps expand within the batch. Downstream steps see the expanded items. Keep views (or fanout factors) modest or use smaller batch_size to bound memory.
Representations by modality (canonical, in-memory between steps):
- 3d_mesh: in-memory trimesh.Trimesh (default from sources). Providers that require files (e.g., Blender) transparently export a temporary .glb and auto-clean.
- 2d_image: in-memory PIL.Image. Early downscale keeps memory low.
Persistence: only writer steps (e.g., DestinationLocal) save artifacts. This provides natural progress saving. Stage checkpoints and resumability may be added later.

Providers

Omniflow uses a unified provider system for extensibility. Providers handle external services (rendering, AI models) with consistent interfaces.

Provider Architecture

ProviderBase: Base class with alias, kind, capabilities, and preflight() validation
Capability Protocols: ProviderCaption (image captioning), ProviderMeshRender (3D rendering)
Registry: Central provider_registry for discovery and validation

Available Providers

Mesh Rendering (kind=mesh_render)

blender: Headless Blender renderer for GLB/GLTF files
- Requires: Blender 3.0+ (set BLENDER_PATH or install to default location)
- Capabilities: render

AI Models (kind=model)

gemini: Google Gemini vision models (gemini-2.5-flash-lite, etc.)
- Requires: dspy-ai package, GEMINI_API_KEY environment variable
- Capabilities: caption
openai: OpenAI GPT-4 vision models (gpt-4o-mini, gpt-4.1-nano, etc.)
- Requires: dspy-ai package, OPENAI_API_KEY environment variable
- Capabilities: caption

Providers at a glance

Kind	Alias	Capabilities	Requirements
mesh_render	blender	render	Blender 3.0+ (`BLENDER_PATH` opt)
model	gemini	caption	`dspy-ai`, `GEMINI_API_KEY`
model	openai	caption	`dspy-ai`, `OPENAI_API_KEY`

Using Providers

Steps accept a provider parameter to select the backend:

# Mesh rendering
MeshScreenshots(provider="blender", ...)

# Caption generation
LabelVisionToText(provider="gemini", model="gemini-2.5-flash-lite", ...)
LabelVisionToText(provider="openai", model="gpt-4o-mini", ...)

Discover providers in code

from omniflow.provider_registry import provider_registry

# List caption-capable model providers
print(provider_registry.providers_for(kind="model", capability="caption"))

# List mesh render providers
print(provider_registry.providers_for(kind="mesh_render", capability="render"))

Sources: representation switch

For local folders, you can choose what the source yields via representation:

# Default: in-memory objects (recommended)
source = SourceFolder(modality="3d_mesh", path="./MODELS", pattern="*.glb", representation="object")

# Power users: stream file paths and let the first transform load lazily
source = SourceFolder(modality="3d_mesh", path="./MODELS", pattern="*.glb", representation="path")

In both cases, streaming keeps memory bounded by batch_size; only the current batch is in memory.

Preflight Validation

The Pipeline.compile() method calls preflight() on each step, which validates:

Provider is available and registered
Required environment variables are set (API keys)
Dependencies are installed (Blender, dspy-ai, etc.)
I/O checks (where applicable):
- Writers create/delete a tiny sentinel in their target directories
- Render providers verify temporary directory create/write/delete

This ensures fail-fast behavior before pipeline execution.

Common preflight failure modes

Missing API key: GEMINI_API_KEY / OPENAI_API_KEY not set
Missing dependency: dspy-ai not installed (for model providers)
Blender not found: BLENDER_PATH not set and auto-detect failed

Example error during compile():

PipelineConfigError: Preflight check failed for step 'label.vision_to_text': Caption provider 'gemini' failed preflight. Available providers: ['gemini', 'openai']. Error: GEMINI_API_KEY environment variable required for ProviderGemini.

Available steps (initial)

Mesh

NormalizeMesh (transform): centers/scales a trimesh.Trimesh (accepts path or object; loads if path)
MeshScreenshots (transform, one-to-many): renders views via provider
- Args: provider, elevation, azimuth, fov, size, bg, verbose
- Input: 3d_mesh (object or path). If given a trimesh.Trimesh, the provider transparently exports a temporary .glb for rendering.
- Output: 2d_image samples with metadata: parent.id, view.{elevation, azimuth, fov, size, bg, provider}

Image

SetBackground (transform): composites RGBA over a solid color to RGB
SetMode (transform): converts image mode (e.g., RGB/RGBA)
ResizeTo (transform): resize images to target dimensions (with optional letterboxing)
CenterCrop (transform): center crop to target size
FlipHorizontal/FlipVertical (transform): flip images for augmentation
Rotate (transform): rotate by 90/180/270 degrees

Image QC

ImageQCCroppedAtEdges (qc): flags if foreground touches any image border
- RGBA: uses alpha; RGB: compares to provided BG color with tolerance
- Writes QC record shape: metadata["qc"][name] = { ok: bool, cropped: bool, touching_edges: {...}, width, height }

Image Labeling

LabelVisionToText (label): generates captions using AI vision models
- Args: provider, model, profile, field, max_chars, max_tokens, overwrite, custom_prompt
- Providers: gemini (default), openai
- Models: gemini-2.5-flash-lite, gpt-4o-mini, gpt-4.1-nano
- Profiles: sdxl (220 char, concise for SD/SDXL training), generic (256 char, flexible)
- Stores caption in metadata[field] (default: labels.caption)
- Presets: LabelVisionToTextStableDiffusion, LabelVisionToTextSDXL
- Requires: dspy-ai package and provider-specific API key

Destinations (Training Data Writers)

Omniflow provides flexible output writers for common training formats:

Core Writers

DestinationLocal: Simple image writer to a folder (basic checkpointing)
DestinationJSONL: Schema-driven JSONL writer with external media files
DestinationFolderLayout: Strict file/folder layouts (e.g., image.png + prompt.txt)
DestinationHF: Export to HuggingFace Datasets format

Preset Writers (Pre-configured for common tasks)

DestinationJSONLImageText: SDXL/Diffusers text→image (JSONL + images)
DestinationFolderPairsSD: SD/SDXL folder pairs (separate image + prompt folders)
DestinationJSONLImage2Image: I2I/IP-Adapter (input + target images)
DestinationJSONLChat: Chat/conversation datasets for LLM tuning
DestinationHFImageText: HuggingFace image-text export

Schema Types

FileRef, ImageFileRef: Define how media files are saved/referenced
ImageTextRow, Image2ImageRow, ChatRow: Pydantic schemas for common formats

See "Output Formats" section below for detailed usage examples.

Quickstart Examples

Basic: GLBs → Screenshots → QC

Use the development script in scripts/run_pipeline.py:

python -m scripts.run_pipeline \
  --models ./path/to/glbs \
  --out ./out \
  --el 15 \
  --az 0 45 90 \
  --size 512 \
  --fov 35 \
  --bg "#d0d0d0" \
  --verbose

Outputs:

PNGs in --out
QC report: --out/qc_report.json (override with --report)

Notes:

Set BLENDER_PATH if Blender isn't auto-detected
Requires Pillow (pip install pillow)

Training Data Prep: 3D Models → Screenshots → Captions → Dataset

from omniflow.pipeline import Pipeline
from omniflow.steps import (
    MeshTransformNormalizeMesh,
    MeshScreenshots,
    ImageTransformSetBackground,
    LabelVisionToTextStableDiffusion,
)
from omniflow.utils.sources import SourceFolder
from omniflow import DestinationJSONLImageText

# Load 3D models
source = SourceFolder(modality="3d_mesh", path="./models", pattern="*.glb")

# Build pipeline
pipe = (
    Pipeline(source)
    .add(MeshTransformNormalizeMesh())
    .add(MeshScreenshots(
        provider="blender",  # Mesh render provider
        elevation=15,
        azimuth=[0, 45, 90, 135, 180, 225, 270, 315],
        fov=35,
        size=512
    ))
    .add(ImageTransformSetBackground(color="#ffffff"))
    .add(LabelVisionToTextStableDiffusion(
        provider="gemini",  # Caption provider
        model="gemini-2.5-flash-lite",
        context_hint="3D block model render"
    ))
    .add(DestinationJSONLImageText(
        out_path="./training/data.jsonl",
        media_dir="./training"
    ))
)

# Preflight checks providers, API keys, dependencies
pipe.compile()

# Run with batching and parallelism
pipe.run(batch_size=4, num_workers=2)

Environment setup:

export BLENDER_PATH="/path/to/blender"  # Optional if Blender is in PATH
export GEMINI_API_KEY="your-api-key-here"
# or
export OPENAI_API_KEY="your-api-key-here"

Output: JSONL file with image paths and captions, ready for SDXL/Diffusers training.

Implementing a Custom Provider

Create a minimal provider by subclassing ProviderBase. For example, a mock caption provider:

from omniflow.provider_registry import ProviderBase, provider_registry

class ProviderMyCaption(ProviderBase):
    def __init__(self):
        super().__init__(alias="mycaption", kind="model", capabilities={"caption"})

    def preflight(self):
        # Validate dependencies/env as needed
        pass

    def caption(self, image, model: str, guidance: str, max_tokens: int | None = None) -> str:
        # Implement your captioning logic
        return "custom caption"

# Register it at startup
provider_registry.register(ProviderMyCaption())

# Use it in a pipeline
# LabelVisionToText(provider="mycaption", model="ignored")

Output Formats

Omniflow provides flexible output writers to generate training data in the exact format your framework expects.

JSONL + External Files (SDXL, Diffusers)

For text-to-image training with SDXL or Diffusers:

from omniflow import DestinationJSONLImageText

writer = DestinationJSONLImageText(
    out_path="./training/data.jsonl",
    media_dir="./training"
)
pipe.add(writer)

Output structure:

training/
  data.jsonl          # {"id": "001", "image_path": "images/001.png", "text": "caption", "meta": {...}}
  images/
    001.png
    002.png
    ...

Folder Pairs (SD/SDXL Traditional)

For trainers that expect image.png + prompt.txt pairs:

from omniflow import DestinationFolderPairsSD

writer = DestinationFolderPairsSD(
    root="./sd_training",
    image_format="png"
)
pipe.add(writer)

Output structure:

sd_training/
  images/
    001.png
    002.png
  prompts/
    001.txt  # "caption text"
    002.txt

Custom Schema (Advanced)

Define your own schema with Pydantic:

from pydantic import BaseModel
from omniflow import DestinationJSONL, ImageFileRef

class MyTrainingRow(BaseModel):
    sample_id: str
    img: str
    prompt: str
    quality_score: float

writer = DestinationJSONL(
    out_path="custom.jsonl",
    media_dir="./data",
    schema_model=MyTrainingRow,
    mapping={
        "sample_id": lambda s: s.id,
        "img": lambda s: ImageFileRef(path_template="img/{id}.jpg", format="jpg"),
        "prompt": lambda s: s.metadata["labels"]["caption"],
        "quality_score": lambda s: s.metadata.get("qc", {}).get("score", 1.0),
    }
)

HuggingFace Datasets

Export directly to HF Datasets format:

from omniflow import DestinationHFImageText

writer = DestinationHFImageText(
    dataset_name="username/my-dataset",
    split="train",
    push_to_hub=True  # Requires HF_TOKEN
)
pipe.add(writer)

Image-to-Image Training

For IP-Adapter, ControlNet, or I2I models:

from omniflow import DestinationJSONLImage2Image

writer = DestinationJSONLImage2Image(
    out_path="i2i.jsonl",
    media_dir="./i2i_data"
)
pipe.add(writer)

Output: JSONL with input_image_path, target_image_path, and optional text fields.

Chat/Conversation Data

For LLM or multimodal chat tuning:

from omniflow import DestinationJSONLChat

writer = DestinationJSONLChat(
    out_path="conversations.jsonl",
    system_message="You are a helpful assistant."
)
pipe.add(writer)

Output: JSONL with messages array containing system/user/assistant turns.

QC usage pattern

Default: QC steps are read-only and annotate metadata only.

Recommended: QC measure → Clean → QC assert
- Measure: e.g., ImageQCCroppedAtEdges
- Clean (future): e.g., ImageCleanDropIfCropped
- Assert (optional): QC step with strict=True to gate curated runs/CI

Placement patterns and writers

Flexible order: you can do measure → clean → assert or measure → assert → clean.
Writers: only writer steps persist artifacts; place them where you want outputs.
- Common: write after the final assert.
- For checkpoints: write after clean, then continue.
Assert policies (planned): fail_on_any (default), max_fail_rate, on_fail={raise|warn|annotate}.

Naming conventions

Prefix by domain and intent:
- MeshTransform*, MeshClean*, MeshQC*
- ImageTransform*, ImageClean*, ImageQC*
Keep steps small, verb-focused, with clear args and docstrings

Design choices (so far)

Small provider interface and registry for pluggability
Minimal Blender helper (engine autodetect, basic lighting)
Canonical in-memory handoff between steps: sources yield in-memory objects by default (e.g., trimesh.Trimesh, PIL.Image), and steps operate on them. Providers that require files perform temporary export internally and auto-clean.
Declarative UX: users describe what they want done; Omniflow internalizes representation adaptation and temp storage. Writers are the only persistent I/O.
Preflight is fail-fast: dependencies, API keys, provider availability, and basic I/O permissions (sentinel writes) are checked in compile().
Broadcasting: single elevation/azimuth values broadcast across views

Roadmap

Completed ✅

✅ Vision-to-text labeling via DSPy (SDXL, generic profiles)
✅ Batching and parallelism for throughput
✅ Provenance tracking in metadata
✅ Multiple image transforms (resize, crop, flip, rotate)

Planned

✅ Flexible output writers with schema mapping (JSONL, HuggingFace datasets, folder layouts)
Dataset-level operations (shuffle, split, deduplicate, sample)
Camera fit: compute distance from bounds and FOV to avoid cropping
Clean steps: drop/correct based on QC flags
Text modality pipeline (cleaning, templating, QC)
Additional labeling backends (BLIP-2, LLaVA, local models)
Multi-view grouping/aggregation (select best, combine captions)
Additional providers (e.g., nvdiffrast)
Resumability and checkpointing

Requirements

Core dependencies:

pip install pillow trimesh numpy

For labeling (caption generation):

pip install dspy-ai

For 3D rendering:

Blender 3.0+ (set BLENDER_PATH or install to default location)

Local development

Install in development mode:

cd omniflow
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Run tests:
```
pytest tests/ -v --cov=omniflow
```

Provider test:

python -m scripts.test_blender_provider --model ./some.glb --out ./tmp --transparent --verbose

Full pipeline (see Quickstart examples)

Contributing

Keep steps single-purpose with explicit signatures
Favor pure transforms and metadata over hidden side effects
Add concise docstrings (summary, args, I/O, arity, raises)

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
blender_scripts		blender_scripts
examples		examples
providers		providers
steps		steps
tests		tests
utils		utils
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
__init__.py		__init__.py
errors.py		errors.py
export_repo_md.py		export_repo_md.py
pipeline.py		pipeline.py
provider_registry.py		provider_registry.py
pytest.ini		pytest.ini
requirements.txt		requirements.txt
step.py		step.py
types.py		types.py

Folders and files

Latest commit

History

Repository files navigation

Omniflow (Early Preview)

Core concepts

Execution model

Providers

Provider Architecture

Available Providers

Providers at a glance

Using Providers

Discover providers in code

Sources: representation switch

Preflight Validation

Common preflight failure modes

Available steps (initial)

Quickstart Examples

Basic: GLBs → Screenshots → QC

Training Data Prep: 3D Models → Screenshots → Captions → Dataset

Implementing a Custom Provider

Output Formats

JSONL + External Files (SDXL, Diffusers)

Folder Pairs (SD/SDXL Traditional)

Custom Schema (Advanced)

HuggingFace Datasets

Image-to-Image Training

Chat/Conversation Data

QC usage pattern

Placement patterns and writers

Naming conventions

Design choices (so far)

Roadmap

Completed ✅

Planned

Requirements

Local development

Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages