Official ComfyUI custom nodes for the Motif-Video 2B text-to-video diffusion transformer
Technical Report | Hugging Face | Project Page
ComfyUI-MotifVideo2B exposes Motif Technologies' Motif-Video 2B text-to-video and image-to-video diffusion transformer as a set of ComfyUI custom nodes, so the model plugs directly into the standard Load Diffusion Model → KSampler → VAE Decode graph.
Motif-Video 2B is a flow-matching diffusion transformer organized around a three-stage DDT-style backbone (dual-stream + single-stream + DDT decoder) with Shared Cross-Attention for long-context text alignment. The architectural derivation and full training recipe are in the Motif-Video 2B technical report; this repository ships the inference-time ComfyUI integration.
Clone the repository and install dependencies:
cd /path/to/ComfyUI/custom_nodes
git clone https://github.com/MotifTechnologies/ComfyUI-MotifVideo2B.git
pip install -r ComfyUI-MotifVideo2B/requirements.txtmotif_core and motif-pipelines do not need to be installed separately — MotifVideoTransformer3DModel is bundled under models/transformer/, so the repository is self-contained.
Heads-up: an upstream
custom-node-list.jsonregistration PR is planned and will be opened shortly. This section will be revised after that PR merges.
ComfyUI-Manager exposes two install paths for this repository, both subject to its security_level setting:
- Custom Nodes Manager (search and install from the registry): not available yet — a
custom-node-list.jsonregistration PR to upstream Manager is planned but not yet submitted. Until that PR is opened and merged, this repository will not appear in Manager's search results. - Install via Git URL (paste a Git URL): under Manager's default
normalsecurity level, this path is rejected outright withThis action is not allowed with this security level configuration.(the upstream policy requiressecurity_level = normal-with--listenon a local IP, orsecurity_level = middle/weak). It is not specific to externally-exposed setups.
If neither Manager path works for you, fall back to the git clone method above. Whether the Custom Nodes Manager search-and-install path works under the default normal security level after the registration PR merges depends on upstream Manager and registry policy at that time and is not something this repository can guarantee. The Install via Git URL path remains gated by Manager's security policy regardless of registration.
When you use either of the two Manager paths above, Manager runs install.py automatically, which installs the entries in requirements.txt. (The manual git clone path documented earlier does not involve Manager — there you run pip install -r requirements.txt yourself.) After restarting and loading either example workflow, the automatic model-download dialog described in Automatic model download will pull the three weight files from Hugging Face on first use, regardless of how the custom node was installed.
All weights live on the official Hugging Face repository:
Download the three files listed below and place them under ComfyUI's standard model directories. The filenames and target directories shown here are the ones the example workflows load by default — pick your own names if you prefer, but keep the target directory the same.
ComfyUI/
├── models/
│ ├── diffusion_models/
│ │ └── motifvideo_2b.safetensors ← transformer/diffusion_pytorch_model.safetensors
│ ├── text_encoders/
│ │ └── motifvideo_t5gemma2.safetensors ← text_encoder/model.safetensors (rename when you save it)
│ └── vae/
│ └── motifvideo_vae.safetensors ← vae/diffusion_pytorch_model.safetensors
Use huggingface-cli to fetch the three files you need:
# Option A: huggingface-cli (fetch only the files you need)
huggingface-cli download Motif-Technologies/Motif-Video-2B \
transformer/diffusion_pytorch_model.safetensors \
text_encoder/model.safetensors \
vae/diffusion_pytorch_model.safetensors \
--local-dir /tmp/motif-video-2bAfter download, rename the files to match the local naming convention above: transformer/diffusion_pytorch_model.safetensors → motifvideo_2b.safetensors, text_encoder/model.safetensors → motifvideo_t5gemma2.safetensors, and vae/diffusion_pytorch_model.safetensors → motifvideo_vae.safetensors. Then place each in its target directory shown above.
The VAE is in diffusers layout; its state_dict keys are remapped to ComfyUI's WAN VAE at load time, so no manual conversion is needed.
Recent ComfyUI versions read the models manifest embedded in Motif-2B_T2V_example.json / Motif-2B_I2V_example.json and offer a one-click dialog to pull the three weight files straight from Hugging Face the first time you open the workflow. Older ComfyUI installs ignore the manifest and fall back to the manual huggingface-cli path above.
If the download dialog does not appear right away on the first drop, drop the workflow once more — ComfyUI's model directory scan can still be initializing.
Tested on ComfyUI v0.18.0 (frontend v1.25.3).
On a host with enough VRAM (H200 or similar), use --highvram:
python main.py --highvram --listen 0.0.0.0 --port 8188- Without
--highvram(defaultNORMAL_VRAM): a bf16 workflow runs at roughly 222 s/step — the transformer is placed on the "staged" path and weights are dispatched every forward. - With
--highvram: 30 s/step — all weights stay resident on the GPU.
Why. On hosts where ComfyUI's comfy_aimdo (DynamicVRAM) is active, models whose leaves use comfy.ops.* are automatically routed to the staged path under NORMAL_VRAM, which means weight dispatch on every forward. This repository's transformer deliberately uses comfy.ops.* end-to-end so that fp8/manual_cast paths work, which means the staging cannot be disabled at the model level. The engine-side workaround is --highvram.
| Node | Inputs | Outputs | Description |
|---|---|---|---|
| Load MotifVideo Text Encoder | clip_name, dtype | CLIP | Loads the T5Gemma2 encoder and exposes it as a ComfyUI CLIP |
| MotifVideo Text Encode | CLIP, text, negative_prompt | CONDITIONING × 2 | Encodes positive and negative prompts in a single node |
| Empty MotifVideo Latent | width, height, num_frames, batch_size | LATENT | Empty video latent sized for the Wan-family VAE |
| Load MotifVideo VAE | vae_name | VAE | Loads the 3D VAE in diffusers layout with automatic key remapping |
| MotifVideo Image Encode | positive, negative, VAE, IMAGE | CONDITIONING × 2 | I2V: VAE-encodes the input image and injects it into the conditioning |
Measured on a single H200 with the default 1280×736, 121-frame workflow:
| Setup | VRAM peak | s/step | Notes |
|---|---|---|---|
bf16 + --highvram |
~29.28 GB | 30 s | Baseline |
fp8_e4m3fn + --highvram |
~28.68 GB | ~31 s | Recommended production path |
fp8 + NORMAL_VRAM |
— | — | Avoid — staged path and earlier fallback regression can re-emerge |
T2V is the default sampling path: MotifVideo Text Encode feeds KSampler directly, with no image-conditioning branch. The full wiring is in workflows/Motif-2B_T2V_example.json, shipped as a reusable ComfyUI subgraph.
Recommended parameters, as shipped in Motif-2B_T2V_example.json:
ModelSamplingSD3shift = 15APGeta = 0, norm_threshold = 12, momentum = 0.1 (Adaptive Projected Guidance, between ModelSamplingSD3 and KSampler)KSamplercfg = 8.0, steps = 50, sampler =dpmpp_2m, scheduler =simpleEmptyMotifLatent1280×736, 33, 65 or 121 frames
For I2V, MotifVideo Image Encode sits between MotifVideo Text Encode and KSampler: it VAE-encodes the input image and injects it into the conditioning as concat_latent_image, so downstream nodes continue to see a normal CONDITIONING pair. The full wiring is in workflows/Motif-2B_I2V_example.json.
The shipped I2V workflow's LoadImage node points at i2v_sample.jpg. Copy assets/i2v_sample.jpg into ComfyUI's input/ folder before running it, or load any other image of your own.
Recommended parameters, as shipped in Motif-2B_I2V_example.json:
ModelSamplingSD3shift = 8APGeta = 0, norm_threshold = 12, momentum = 0.1 (Adaptive Projected Guidance, between ModelSamplingSD3 and KSampler)KSamplercfg = 8.0, steps = 50, sampler =dpmpp_2m, scheduler =simpleEmptyMotifLatent1280×736, 33, 65 or 121 frames
Switch back to T2V by removing the MotifVideo Image Encode node and wiring MotifVideo Text Encode straight into KSampler — or just load the T2V workflow instead.
The standard MotifVideo sampling graph flows UNETLoader → ModelSamplingSD3 → KSampler → VAE Decode → Create Video → Save Video, with MotifTextEncoderLoader + MotifTextEncode feeding the KSampler conditioning and EmptyMotifLatent seeding the latent. Load the example JSON through ComfyUI's Load menu rather than rebuilding the graph by hand:
workflows/Motif-2B_T2V_example.json— Text-to-Video graphworkflows/Motif-2B_I2V_example.json— Image-to-Video graph
Make sure the model files described in the Installation section are in place first, and that the UNETLoader / text-encoder / VAE selections inside the loaded workflow match your local filenames.
If you use Motif-Video 2B in your research, please cite the technical report:
@techreport{motifvideo2b2026,
title = {Motif-Video 2B: Technical Report},
author = {Motif Technologies},
year = {2026},
institution = {Motif Technologies},
url = {https://arxiv.org/abs/2604.16503}
}This repository is released under the Apache 2.0 License. See LICENSE for details.

