Add native ComfyUI provider for image and video generation#29
Add native ComfyUI provider for image and video generation#29martimramos wants to merge 3 commits intocalesthio:mainfrom
Conversation
…ration Adds three new BaseTool providers that delegate GPU work to a running ComfyUI server via its REST API. This avoids the need to install PyTorch/diffusers directly, which is critical on hardware where the ecosystem hasn't caught up (e.g. NVIDIA Blackwell / DGX Spark, aarch64 + CUDA 13.0). New files: - tools/_comfyui/client.py — shared REST client (submit/poll/download) - tools/_comfyui/workflows/ — 4 bundled workflow templates - tools/graphics/comfyui_image.py — FLUX 2 Dev NVFP4 text-to-image - tools/video/comfyui_video.py — WAN 2.2 14B t2v + i2v (4-step LightX2V) - tools/audio/comfyui_music.py — ACE-Step 3.5B music generation - tests/contracts/test_comfyui_tools.py — 41 contract tests - docs/comfyui-adapter-plan.md — design document Zero changes to existing tools, selectors, registry, or pipelines. Tools are auto-discovered and selectors pick them up via capability match. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Client queries ComfyUI /object_info to discover installed models (checkpoints, diffusion models, VAE, CLIP, LoRAs) - Each tool declares its required models and checks them on execute() - get_status() returns DEGRADED when server is up but models are missing - Clear error messages tell the user exactly which models to download - When COMFYUI_SERVER_URL is not set, error message tells the user to configure it in .env instead of silently failing on localhost:8188 - 8 new tests covering URL config, error messages, and model requirements Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Removed comfyui_music and its workflow. The ACE-Step model runs in ComfyUI but the node class names differ across custom node packs (AceStepModelLoader vs native TextEncodeAceStepAudio, etc.), so a bundled workflow would break for most users. Documented the reasoning in the plan doc and listed it as an open question for future work. Users with ACE-Step working can still use the workflow_json override on any tool. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
This is a fantastic initiative and, directionally, I think it would be a very strong addition to OpenMontage. The biggest win here is not just ?another provider?, but a much better local backend abstraction. Using ComfyUI as the execution layer makes a lot of sense for hardware portability, especially for setups where direct That said, after a technical + governance pass, I think there are a few issues worth addressing before merge:
On the open questions, my current take would be:
Overall: I?m very supportive of the direction. If the custom workflow contract, partial-availability reporting, and agent-skill/provenance integration are tightened up, I think this would be a genuinely valuable addition to the project. |
Summary
comfyui_imageandcomfyui_videotools that delegate GPU work to a running ComfyUI server via its REST APItools/_comfyui/client.py) and 3 bundled workflow templates (FLUX 2 txt2img, WAN 2.2 i2v 4-step, WAN 2.2 t2v 4-step)/object_infoendpoint — tools check that required models are installed before generating, and give actionable error messages when they're missingCOMFYUI_SERVER_URLconfiguration guidance when server isn't reachableWhy
OpenMontage's existing local GPU tools use HuggingFace
diffusersdirectly. This breaks on hardware where the PyTorch ecosystem hasn't caught up — notably NVIDIA Blackwell / DGX Spark (aarch64, CUDA 13.0, sm_121) where there are no stable PyTorch wheels. ComfyUI already solves these compatibility issues and NVIDIA ships official optimized containers for it.This adapter lets OpenMontage delegate GPU generation to ComfyUI, avoiding the need to install PyTorch/diffusers directly. Same models, better hardware portability.
What's included
tools/_comfyui/client.pytools/graphics/comfyui_image.pytools/video/comfyui_video.pytools/_comfyui/workflows/flux2-txt2img.jsontools/_comfyui/workflows/wan22-i2v-4step.jsontools/_comfyui/workflows/wan22-t2v-4step.jsontests/contracts/test_comfyui_tools.pydocs/comfyui-adapter-plan.mdZero changes to existing files — tools auto-register via the existing discovery mechanism. Selectors pick them up via capability match.
What's NOT included (and why)
Music generation (
comfyui_music) — We explored this with ACE-Step 3.5B. The model runs fine in ComfyUI, but the node interface isn't standardized. Different custom node packs use different class names (AceStepModelLoadervs nativeTextEncodeAceStepAudio), so a bundled workflow would break for most users. Documented in the plan doc as an open question. Happy to discuss approaches — maybe aworkflow_jsonoverride-only pattern, or waiting for node convergence.Tested on
Test plan
🤖 Generated with Claude Code