Add native ComfyUI provider for image and video generation by martimramos · Pull Request #29 · calesthio/OpenMontage

martimramos · 2026-04-16T23:35:24Z

Summary

Adds comfyui_image and comfyui_video tools that delegate GPU work to a running ComfyUI server via its REST API
Includes a shared client (tools/_comfyui/client.py) and 3 bundled workflow templates (FLUX 2 txt2img, WAN 2.2 i2v 4-step, WAN 2.2 t2v 4-step)
Model discovery via ComfyUI's /object_info endpoint — tools check that required models are installed before generating, and give actionable error messages when they're missing
Actionable COMFYUI_SERVER_URL configuration guidance when server isn't reachable

Why

OpenMontage's existing local GPU tools use HuggingFace diffusers directly. This breaks on hardware where the PyTorch ecosystem hasn't caught up — notably NVIDIA Blackwell / DGX Spark (aarch64, CUDA 13.0, sm_121) where there are no stable PyTorch wheels. ComfyUI already solves these compatibility issues and NVIDIA ships official optimized containers for it.

This adapter lets OpenMontage delegate GPU generation to ComfyUI, avoiding the need to install PyTorch/diffusers directly. Same models, better hardware portability.

What's included

Component	File	Lines
Shared REST client	`tools/_comfyui/client.py`	~180
Image generation	`tools/graphics/comfyui_image.py`	~140
Video generation (t2v + i2v)	`tools/video/comfyui_video.py`	~190
FLUX 2 Dev workflow	`tools/_comfyui/workflows/flux2-txt2img.json`
WAN 2.2 I2V 4-step workflow	`tools/_comfyui/workflows/wan22-i2v-4step.json`
WAN 2.2 T2V 4-step workflow	`tools/_comfyui/workflows/wan22-t2v-4step.json`
Contract tests	`tests/contracts/test_comfyui_tools.py`	~200
Design document	`docs/comfyui-adapter-plan.md`

Zero changes to existing files — tools auto-register via the existing discovery mechanism. Selectors pick them up via capability match.

What's NOT included (and why)

Music generation (comfyui_music) — We explored this with ACE-Step 3.5B. The model runs fine in ComfyUI, but the node interface isn't standardized. Different custom node packs use different class names (AceStepModelLoader vs native TextEncodeAceStepAudio), so a bundled workflow would break for most users. Documented in the plan doc as an open question. Happy to discuss approaches — maybe a workflow_json override-only pattern, or waiting for node convergence.

Tested on

NVIDIA DGX Spark (GB10 Blackwell, aarch64, CUDA 13.0, 128GB unified memory)
ComfyUI 0.19.1 on NGC PyTorch 25.10 container
FLUX 2 Dev NVFP4 image generation (~115s per 1024x1024)
WAN 2.2 14B I2V with LightX2V 4-step LoRA (~360s per 5s clip)
Full end-to-end test via Claude Code orchestration (preflight → image → i2v → output)

Test plan

45 contract tests pass (no ComfyUI server required)
Full existing test suite passes (264 passed, 6 skipped)
Live image generation test (FLUX 2)
Live i2v video generation test (WAN 2.2)
Model discovery against running server
Error paths: server down, missing models, wrong URL
Test on consumer GPU (RTX 3090/4090, x86)

🤖 Generated with Claude Code

…ration Adds three new BaseTool providers that delegate GPU work to a running ComfyUI server via its REST API. This avoids the need to install PyTorch/diffusers directly, which is critical on hardware where the ecosystem hasn't caught up (e.g. NVIDIA Blackwell / DGX Spark, aarch64 + CUDA 13.0). New files: - tools/_comfyui/client.py — shared REST client (submit/poll/download) - tools/_comfyui/workflows/ — 4 bundled workflow templates - tools/graphics/comfyui_image.py — FLUX 2 Dev NVFP4 text-to-image - tools/video/comfyui_video.py — WAN 2.2 14B t2v + i2v (4-step LightX2V) - tools/audio/comfyui_music.py — ACE-Step 3.5B music generation - tests/contracts/test_comfyui_tools.py — 41 contract tests - docs/comfyui-adapter-plan.md — design document Zero changes to existing tools, selectors, registry, or pipelines. Tools are auto-discovered and selectors pick them up via capability match. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Client queries ComfyUI /object_info to discover installed models (checkpoints, diffusion models, VAE, CLIP, LoRAs) - Each tool declares its required models and checks them on execute() - get_status() returns DEGRADED when server is up but models are missing - Clear error messages tell the user exactly which models to download - When COMFYUI_SERVER_URL is not set, error message tells the user to configure it in .env instead of silently failing on localhost:8188 - 8 new tests covering URL config, error messages, and model requirements Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Removed comfyui_music and its workflow. The ACE-Step model runs in ComfyUI but the node class names differ across custom node packs (AceStepModelLoader vs native TextEncodeAceStepAudio, etc.), so a bundled workflow would break for most users. Documented the reasoning in the plan doc and listed it as an open question for future work. Users with ACE-Step working can still use the workflow_json override on any tool. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

calesthio · 2026-04-17T06:10:40Z

This is a fantastic initiative and, directionally, I think it would be a very strong addition to OpenMontage.

The biggest win here is not just ?another provider?, but a much better local backend abstraction. Using ComfyUI as the execution layer makes a lot of sense for hardware portability, especially for setups where direct diffusers / PyTorch support is rough or lagging. It also fits the existing image/video selector architecture well, so the overall shape of the proposal feels right.

That said, after a technical + governance pass, I think there are a few issues worth addressing before merge:

The workflow_json override is currently advertised as fully custom / drop-in, but the implementation still hardcodes output nodes.
- comfyui_image still downloads from node 13
- comfyui_video still forces the custom workflow path through output node 16
- This means arbitrary community workflows will fail unless they happen to use the bundled node IDs.
- I think this needs an explicit output_node input (or equivalent contract) if custom workflows are meant to be first-class.
comfyui_video.get_status() overstates availability when only one mode is actually usable.
- Right now the tool returns AVAILABLE as long as either T2V or I2V models exist.
- In practice that means selector routing can surface ComfyUI for an image_to_video request even when only T2V is installed, and the failure only happens at execution time.
- I?d recommend either reporting this more precisely or filtering/ranking by operation-specific readiness.
The new tools currently publish agent_skills = [].
- In OpenMontage this is more than metadata: selectors propagate those skills into the agent context, and AGENT_GUIDE.md explicitly expects Layer 3 skills to be read before generation tools are used.
- So even if the provider works technically, this weakens the prompt/governance path compared with the other generation providers.
- I think these tools should expose at least one relevant skill so they participate properly in the existing agent flow.
Provenance becomes misleading when workflow_json is used.
- The tools still report fixed model names (flux2-dev-nvfp4, wan2.2-14b-fp8-4step) even if the caller supplies a completely different custom workflow.
- Since OpenMontage uses this metadata downstream in manifests / publishing / auditability, it would be better to either surface custom workflow provenance explicitly or mark the model/workflow as user-supplied.
The RFC / doc currently drifts a bit from the repo reality.
- It references music_selector, which doesn?t exist in the current codebase.
- It also describes some workflow/output behavior that doesn?t fully match the shipped implementation.
- I don?t think that blocks the image/video adapter itself, but I would tighten the doc so it matches the actual OpenMontage architecture.

On the open questions, my current take would be:

Workflow versioning: keep blessed workflows in-repo for reproducibility, but add a stricter override contract (workflow_json/workflow_path + output_node + optional provenance fields).
Async generation: polling is fine for merge; websocket progress can come later as a UX improvement.
Multi-server: worth supporting later via per-capability env vars like image/video server URLs, but not necessary for the first merge.
Music generation: I would revisit the doc language here. ComfyUI?s ACE-Step story looks more mature now than the RFC currently suggests, so I?d frame this less as ?wait indefinitely? and more as ?follow up once we decide the OpenMontage music-selection integration shape.?

Overall: I?m very supportive of the direction. If the custom workflow contract, partial-availability reporting, and agent-skill/provenance integration are tightened up, I think this would be a genuinely valuable addition to the project.

martimramos and others added 3 commits April 16, 2026 23:59

martimramos requested a review from calesthio as a code owner April 16, 2026 23:35

calesthio mentioned this pull request Apr 17, 2026

Add Support for ComfyUI #33

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add native ComfyUI provider for image and video generation#29

Add native ComfyUI provider for image and video generation#29
martimramos wants to merge 3 commits intocalesthio:mainfrom
martimramos:comfyui-adapter

martimramos commented Apr 16, 2026 •

edited

Loading

Uh oh!

calesthio commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

martimramos commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

What's included

What's NOT included (and why)

Tested on

Test plan

Uh oh!

calesthio commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

martimramos commented Apr 16, 2026 •

edited

Loading