Skip to content

Releases: ussoewwin/ComfyUI-DistorchMemoryManager

RES4LYF Flash Attention Fix — Full Technical Note

28 Mar 23:00

Choose a tag to compare

This document records what failed, why, the underlying issue, what changed, and what each change means for the fix applied to sd/attention.py shipped with the custom node RES4LYF.


1. Summary

  • Symptom: Logs showed Flash Attention failed, using default SDPA: followed repeatedly by PyTorch internals such as
    schema_.has_value() INTERNAL ASSERT FAILED (near ATen/core/dispatch/OperatorEntry.h).
  • Typical environment: PyTorch 2.11 + CUDA 13.x (e.g. 2.11.0+cu130), ComfyUI with --use-flash-attention, workflows that use attention code paths through RES4LYF.
  • Fix: In RES4LYF’s attention_flash implementation, remove the extra torch.library.custom_op wrapper and remove the SDPA fallback; call flash_attn_func directly instead.

2. What the error looked like (log output)

A typical sequence was:

  1. Warning
    Flash Attention failed, using default SDPA: ...

  2. PyTorch internal assert (example)
    schema_.has_value() INTERNAL ASSERT FAILED at "...OperatorEntry.h":84
    Tried to access the schema for which doesn't have a schema registered yet
    please report a bug to PyTorch.

Exact wording can vary by build/version, but the nature is the same: dispatcher / operator schema resolution failure.


3. Cause (what the code was doing)

Before the fix, RES4LYF attention.py roughly had a two-stage structure.

3.1 Stage 1: Another custom op layered on top of flash_attn_func

Following the same pattern as ComfyUI core, it wrapped flash-attn’s flash_attn_func like this:

@torch.library.custom_op("flash_attention::flash_attn", mutates_args=())
def flash_attn_wrapper(q, k, v, dropout_p=0.0, causal=False):
    return flash_attn_func(q, k, v, dropout_p=dropout_p, causal=causal)

That registers a new PyTorch custom op name: flash_attention::flash_attn.

Meanwhile, the flash-attn package itself (see flash_attn_interface.py) uses torch.library.custom_op and related APIs for PyTorch 2.4+ integration.

So the same computation was routed through two custom-op layers: flash-attn’s internal registration plus the RES4/ComfyUI-style alias flash_attention::flash_attn.

3.2 Stage 2: Fallback to torch.nn.functional.scaled_dot_product_attention on exception

try:
    out = flash_attn_wrapper(...)
except Exception as e:
    logging.warning(f"Flash Attention failed, using default SDPA: {e}")
    out = torch.nn.functional.scaled_dot_product_attention(...)

If stage 1 raised or hit dispatcher inconsistency, execution fell through to SDPA after the warning. That path could hit dispatcher issues of the same family, which is why logs looked like warning + internal assert in succession.


4. The underlying issue (design level)

Aspect Issue
Duplicate abstraction flash_attn_func is already the supported entry point for integrating flash-attn with PyTorch. Stacking another ComfyUI-style custom op on top is prone to dispatcher clashes on some versions.
Fallback target SDPA also goes through PyTorch’s attention backends and can trigger another failure on the same dispatcher, not “a safe escape hatch.”
Confusion with “bad build” Internal PyTorch paths in the log push people toward ABI / wheel theories; in this incident the main driver was how this node’s bundled attention code called flash-attn.

The core problem: deviation from the intended API (flash_attn_func directly) plus routing failures into SDPA.


5. Why it didn’t show on PyTorch 2.10 but did on 2.11

  • With the same RES4 code, differences in OperatorEntry / schema resolution mean 2.10 often did not reach the same internal asserts (or behaved differently on the same path).
  • 2.11 changed dispatcher behavior, so the double wrapper + fallback combination surfaced there.

This does not mean “2.10 was correct”—it means latent risk that didn’t surface on 2.10.


6. File changed

Field Value
Path ComfyUI/custom_nodes/RES4LYF/sd/attention.py
Function attention_flash
Assumption from flash_attn import flash_attn_func succeeds near the top of the file (when using --use-flash-attention).

7. What changed (code details)

7.1 Removed

  1. The entire flash_attn_wrapper block

    • @torch.library.custom_op("flash_attention::flash_attn", ...)
    • @flash_attn_wrapper.register_fake
    • The AttributeError stub flash_attn_wrapper
  2. The try / except inside attention_flash

    • Success path: flash_attn_wrapper(q.transpose(1, 2), ...)
    • Failure path: logging.warning(...) + torch.nn.functional.scaled_dot_product_attention(...)

7.2 Core of attention_flash after the fix (excerpt)

def attention_flash(q, k, v, heads, mask=None, attn_precision=None, skip_reshape=False, skip_output_reshape=False):
    if skip_reshape:
        b, _, _, dim_head = q.shape
    else:
        b, _, dim_head = q.shape
        dim_head //= heads
        q, k, v = map(
            lambda t: t.view(b, -1, heads, dim_head).transpose(1, 2),
            (q, k, v),
        )

    if mask is not None:
        if mask.ndim == 2:
            mask = mask.unsqueeze(0)
        if mask.ndim == 3:
            mask = mask.unsqueeze(1)

    assert mask is None
    # Match ComfyUI core: call flash_attn_func directly (avoid duplicate custom_op + broken SDPA fallback on torch 2.11+).
    out = flash_attn_func(
        q.transpose(1, 2),
        k.transpose(1, 2),
        v.transpose(1, 2),
        dropout_p=0.0,
        causal=False,
    ).transpose(1, 2)
    if not skip_output_reshape:
        out = (
            out.transpose(1, 2).reshape(b, -1, heads * dim_head)
        )
    return out

(The real file continues with optimized_attention wiring as before.)


8. Meaning of each change

Change Meaning
Remove flash_attn_wrapper Stops registering flash_attention::flash_attn as a second custom op, avoiding dispatcher mismatches common on PyTorch 2.11.x.
Call flash_attn_func directly Restores the supported API path; leave custom-op registration to the flash-attn package.
Remove try / SDPA Avoids falling into the broken SDPA path on failure; exceptions propagate, breaking the “FA fails → SDPA hits a second failure” chain.
assert mask is None Makes explicit that this path does not support masks (aligned with the old RuntimeError inside try).
Comment Explains why we do not keep the ComfyUI-style wrapper.

9. Relationship to ComfyUI core

  • ComfyUI core comfy/ldm/modules/attention.py may still contain the same pattern (flash_attn_wrapper + SDPA fallback) historically.
  • In this incident, logs pointed to RES4LYF’s sd/attention.py, and symptoms cleared with only the RES4 fix—meaning that workflow actually used RES4’s attention implementation.
  • If you avoid touching core, other workflows that hit core’s same pattern could still show issues; mitigations include runtime patching from another custom node so updates don’t overwrite edits.

10. Operational notes

  1. Updating RES4LYF may overwrite sd/attention.py and remove this fix. Re-check diffs after upgrades using this note.
  2. --use-flash-attention and an installed flash-attn remain prerequisites.
  3. After the fix, if Flash Attention truly fails (e.g. OOM), it no longer auto-falls back to SDPA; you’ll see an exception instead. That trades “silent double failure” for easier debugging.

11. One-sentence summary

RES4LYF wrapped flash_attn_func in an extra custom_op and fell back to SDPA on failure, which conflicted with PyTorch 2.11’s dispatcher and produced internal asserts in logs; calling flash_attn_func directly and removing the SDPA fallback removes that underlying risk.


Document note: Based on facts at investigation/fix time. Behavior may vary with PyTorch / ComfyUI / RES4LYF versions.

ComfyUI-DistorchMemoryManager Patch Explanation

27 Mar 04:29

Choose a tag to compare

1. Why this fix was needed

In production usage, the following message was repeatedly emitted and became major log noise:

  • Error running sage attention: Unsupported head_dim: 160, using pytorch attention instead.

This is not a fatal stop condition. It is a known recoverable path where execution continues via PyTorch attention fallback.
The real operational issue was that the same recoverable case was logged as error over and over.


2. Root cause

When SageAttention is called with unsupported head_dim=160 combinations, it can raise an exception.
The implementation already has a fallback path, so generation can continue.

The problem was the logging policy: a known recoverable fallback was recorded as error every time.


3. Why this was fixed in the custom node

Direct patches to ComfyUI core files are overwritten by updates and are hard to maintain.
So this project uses the following policy:

  • No dependency on direct core-file edits
  • Apply runtime patching when the custom node is loaded
  • Keep behavior stable across ComfyUI updates by reapplying from the custom node side

In short, this is an external runtime patch strategy optimized for long-term maintainability.


4. Files changed and what they mean

4.1 ComfyUI/custom_nodes/ComfyUI-DistorchMemoryManager/nodes/sa.py

Change

The exception handling in attention_sage() was adjusted so only Unsupported head_dim: 160 is suppressed.

  • Known exception (Unsupported head_dim: 160) logs once as info
  • Other exceptions remain error (unchanged behavior)
  • Fallback target stays attention_pytorch (unchanged behavior)

Meaning

This suppresses only known noise while preserving visibility of unknown failures.

4.2 ComfyUI/custom_nodes/ComfyUI-DistorchMemoryManager/__init__.py

Change (core of this patch)

At startup, _install_sage_attention_noise_guard() runs and replaces ComfyUI's attention_sage externally via runtime patching.

Implementation highlights:

  • Imports comfy.ldm.modules.attention
  • Gets the current attention_sage
  • Defines wrapped attention_sage_guarded with @wrap_attn and swaps it in
  • Also swaps these references when needed:
    • optimized_attention
    • optimized_attention_masked
    • REGISTERED_ATTENTION_FUNCTIONS["sage"]
  • Prevents double patching via _dm_sage160_guard
  • Limits suppression strictly to "Unsupported head_dim: 160"

Meaning

Behavior is corrected at runtime without editing core files directly, making it resilient to ComfyUI updates.


5. Scope and safety

This suppression is narrowly scoped, not blanket suppression.

  • Match target: "Unsupported head_dim: 160"
  • Known case: first occurrence logged as info
  • All other exceptions: still logged as error

As a result:

  • Reduces known SD1.5-style log spam
  • Preserves observability for other models and unknown errors

6. Verification checklist

  1. After restarting ComfyUI, confirm this startup log:
    • [ComfyUI-DistorchMemoryManager] Installed external sage-attention head_dim=160 noise guard
  2. During SD1.5 runs, repeated error logs for Unsupported head_dim: 160 no longer appear
  3. Fallback still works and generation continues
  4. SA/FA behavior for other models (e.g., SDXL) remains functional
  5. Non-known SageAttention exceptions are still visible as error

7. Future improvements (developer ideas)

  • Make suppression targets configurable (e.g., list of unsupported head dimensions)
  • Add UI toggle to enable/disable suppression
  • Explicitly document runtime patch policy in README/CHANGELOG
  • Improve startup logs to show patch apply status in more detail

8. Summary

This patch suppresses only the known log spam caused by the head_dim=160 fallback path, while keeping unknown error visibility intact.
By implementing it as a custom-node runtime patch instead of direct core modification, it improves long-term maintainability.

v2.3.6 - Enhanced SageAttention3 (SA3) Integration

23 Jan 18:42

Choose a tag to compare

Enhanced SageAttention3 (SA3) Integration

This release enhances the Patch Sage Attention DM node with improved SageAttention3 (SA3) support, including version detection, constraint handling, and automatic fallback mechanisms.

Key Features

1. SA3 Version Detection

  • Added dedicated get_sage_attention3_info() function for SA3 version detection
  • Detects Blackwell GPU support (FP4 kernel availability)
  • Returns version information, availability status, and Blackwell support flag

2. Improved Logging

  • SA2 version logs are now skipped when SA3 modes are selected
  • SA3-specific logging with version information (e.g., "SageAttention3 3.0.0.b1 (Blackwell FP4)")
  • Clear distinction between SA2 and SA3 mode logging

3. Constraint Handling and Fallback

  • Automatic fallback to PyTorch SDPA when SA3 constraints are not met:
    • headdim >= 256: SA3 FP4 kernel does not support head dimensions >= 256
    • attn_mask != None: SA3 does not support attention masks
  • Seamless fallback ensures compatibility with all model configurations

4. Tensor Layout Conversion

  • Automatic conversion from ComfyUI's default NHD layout [batch, seq_len, heads, dim] to SA3's expected HND layout [batch, heads, seq_len, dim]
  • Proper layout restoration after SA3 processing

5. Per-Block Mean Support

  • Support for both sageattn3 (standard mode, accuracy-focused) and sageattn3_per_block_mean (fast mode, Triton-optimized)
  • Per-block mean processing uses 128-token blocks with Triton kernels for improved performance

Technical Details

Modified File: nodes/sa.py

Changes:

  • Lines 102-125: Added get_sage_attention3_info() function
  • Lines 151-152: Added SA2 log skip condition for SA3 modes
  • Lines 184-188: Added SA3-specific version detection and logging
  • Lines 195-200: Clarified tensor layout conversion
  • Lines 202-210: Added fallback detection logic
  • Lines 211-218: Improved fallback processing and SA3 call branching

Usage

  1. Start ComfyUI with --use-sage-attention flag:

    python main.py --use-sage-attention
  2. Add the Patch Sage Attention DM node to your workflow

  3. Select SA3 mode from the sage_attention dropdown:

    • sageattn3: Standard SA3 mode (accuracy-focused)
    • sageattn3_per_block_mean: Fast SA3 mode (Triton-optimized)

Expected Log Output

When SA2 is selected:

Patching comfy attention to use SageAttention 2.1.1+cu128torch2.7

When SA3 is selected:

Patching comfy attention to use SageAttention3 3.0.0.b1 (Blackwell FP4)

Compatibility

  • Requires sageattn3 package (v3.0.0.b1 or higher recommended)
  • Optimized for Blackwell architecture GPUs (RTX 5060 Ti 16GB and similar)
  • Automatic fallback ensures compatibility with all model configurations
  • Compatible with ComfyUI's attention function format via wrap_attn decorator

Notes

  • SA3 constraints (headdim < 256, no attention mask) are automatically handled with fallback to PyTorch SDPA
  • The node dynamically patches attention on each model execution and automatically cleans up afterward
  • Version information is logged on every generation for debugging and verification

v2.3.3 - Fix Node Import Paths (Issue #3)

12 Jan 17:44

Choose a tag to compare

Release Notes v2.3.3 - Fix Node Import Paths (Issue #3)

Overview

This release fixes Issue #3, which caused only one node (Model Patch Memory Cleaner) to be visible in ComfyUI after updating, despite the repository containing four nodes.

Problem Description

Symptom

After updating the ComfyUI-DistorchMemoryManager custom node, users reported that only one node (Model Patch Memory Cleaner) was visible in ComfyUI's node palette, even though the repository contains four nodes:

  1. Memory Manager
  2. Safe Memory Manager
  3. Purge VRAM V2 (DisTorchPurgeVRAMV2)
  4. Patch Sage Attention DM

Root Causes

The issue had two root causes:

1. Incorrect Import Paths

After refactoring the project structure to move node files into a nodes/ subdirectory, the import statements in __init__.py were not updated to reflect the new directory structure.

Before refactoring:

ComfyUI-DistorchMemoryManager/
├── __init__.py
├── memory_manager.py
├── purge_vram.py
└── sa.py

After refactoring:

ComfyUI-DistorchMemoryManager/
├── __init__.py
└── nodes/
    ├── memory_manager.py
    ├── purge_vram.py
    └── sa.py

The code was still trying to import from the root directory:

# ❌ INCORRECT (Before Fix)
from .memory_manager import MemoryManager, SafeMemoryManager, any
from .purge_vram import DisTorchPurgeVRAMV2
from .sa import PatchSageAttentionDM

2. Missing __init__.py in nodes/ Directory

Even after fixing the import paths, the nodes/ directory was missing an __init__.py file, which is required for Python to recognize it as a package. Without this file, relative imports (like from .nodes.memory_manager import ...) fail silently.

Solution

Fix 1: Corrected Import Paths

Updated all import statements to point to the nodes/ subdirectory:

# ✅ CORRECT (After Fix)
from .nodes.memory_manager import MemoryManager, SafeMemoryManager, any
from .nodes.purge_vram import DisTorchPurgeVRAMV2
from .nodes.sa import PatchSageAttentionDM

Fix 2: Added nodes/__init__.py

Created nodes/__init__.py to make the nodes/ directory a proper Python package:

# nodes/__init__.py
# Nodes package for ComfyUI-DistorchMemoryManager
# This file makes the nodes directory a Python package

Fix 3: Added Debug Logging

Added comprehensive debug logging to help diagnose import issues:

  • Import Success Logging: Logs when imports succeed
  • Import Failure Logging: Logs detailed error messages when imports fail
  • Node Registration Logging: Logs which nodes are successfully registered
  • Registration Summary: Logs the complete list of registered nodes

Example debug output:

[ComfyUI-DistorchMemoryManager] Successfully imported MemoryManager and SafeMemoryManager from .nodes.memory_manager
[ComfyUI-DistorchMemoryManager] Successfully imported DisTorchPurgeVRAMV2 from .nodes.purge_vram
[ComfyUI-DistorchMemoryManager] Successfully imported PatchSageAttentionDM from .nodes.sa
[ComfyUI-DistorchMemoryManager] Registered MemoryManager node
[ComfyUI-DistorchMemoryManager] Registered SafeMemoryManager node
[ComfyUI-DistorchMemoryManager] Registered DisTorchPurgeVRAMV2 node
[ComfyUI-DistorchMemoryManager] Registered PatchSageAttentionDM node
[ComfyUI-DistorchMemoryManager] Total registered nodes: ['ModelPatchMemoryCleaner', 'MemoryManager', 'SafeMemoryManager', 'DisTorchPurgeVRAMV2', 'PatchSageAttentionDM']

Technical Details

Why __init__.py is Required

In Python, a directory must contain an __init__.py file to be recognized as a package. Without it:

  • Relative imports (.nodes.memory_manager) fail
  • The directory is treated as a namespace package (Python 3.3+) or not recognized at all (Python < 3.3)
  • Import errors occur silently in some cases

Import Mechanism

The code uses a two-level try-except pattern for robustness:

try:
    from .nodes.memory_manager import ...  # Relative import (preferred)
except ImportError:
    try:
        from nodes.memory_manager import ...  # Absolute import (fallback)
    except ImportError:
        # Set to None if both fail

Why two levels?

  1. First attempt (relative): Works when the package is properly installed and imported as a module
  2. Second attempt (absolute): Works in edge cases where the relative import fails but the absolute path is in sys.path
  3. Fallback: Prevents NameError if both imports fail, allowing the code to continue executing

Changes Made

Files Modified

  1. __init__.py

    • Updated import paths from .memory_manager to .nodes.memory_manager
    • Updated import paths from .purge_vram to .nodes.purge_vram
    • Updated import paths from .sa to .nodes.sa
    • Added comprehensive debug logging for imports and node registration
  2. nodes/__init__.py (NEW FILE)

    • Created to make nodes/ directory a proper Python package
    • Required for relative imports to work correctly

Verification

How to Verify the Fix

  1. Check ComfyUI Console Output

    • Start ComfyUI and check the console for debug messages
    • Should see "Successfully imported" messages for all four nodes
    • Should see "Registered" messages for all four nodes
    • Should see the complete list of registered nodes
  2. Check ComfyUI Node Palette

    • Open ComfyUI
    • Search for "Memory" category
    • Should see all 4 nodes listed:
      • Memory Manager
      • Safe Memory Manager
      • LayerUtility: Purge VRAM V2
      • Patch Sage Attention DM
    • Model Patch Memory Cleaner should also be visible
  3. Test Node Functionality

    • Add each node to a workflow
    • Verify that all nodes function correctly
    • Check that no import errors appear in the console

Expected Behavior After Fix

  • All 4 nodes appear in ComfyUI's node palette under the "Memory" category
  • ✅ No import errors in ComfyUI console
  • ✅ All nodes function correctly when used in workflows
  • ✅ Debug logging provides clear information about import and registration status

Impact

Before Fix

  • ❌ Only 1 node visible (Model Patch Memory Cleaner)
  • ❌ 3 nodes missing (Memory Manager, Safe Memory Manager, Purge VRAM V2, Patch Sage Attention DM)
  • ❌ Users unable to use missing nodes
  • ❌ Silent failure (no error messages)
  • ❌ Difficult to diagnose the problem

After Fix

  • ✅ All 4 nodes visible
  • ✅ All nodes functional
  • ✅ Users can access all features
  • ✅ Proper error handling maintained
  • ✅ Debug logging helps diagnose future issues

Related Information

  • Issue: GitHub Issue #3
  • Fixed Files:
    • __init__.py (import paths and debug logging)
    • nodes/__init__.py (new file)
  • Documentation: See ISSUE_3_FIX_DOCUMENTATION.md for complete technical details

Summary

This release fixes a critical bug that prevented three of four custom nodes from being visible in ComfyUI. The fix involved:

  1. Correcting import paths to reflect the new nodes/ directory structure
  2. Adding nodes/__init__.py to make the directory a proper Python package
  3. Adding debug logging to help diagnose future import issues

Key Takeaway: When refactoring project structure, always:

  • Update import paths immediately
  • Ensure all subdirectories have __init__.py files if they contain Python modules
  • Add debug logging to help diagnose issues
  • Test thoroughly after structural changes

v2.3.0 - Flash-Attention Auto-Load and Version Detection Features

10 Jan 09:59

Choose a tag to compare

Overview

Added independent Flash-Attention auto-load and version detection features. These features are completely independent from ComfyUI's model_management module, eliminating the need to manually modify model_management.py after ComfyUI updates.

Key Features

Independent Version Detection

  • Flash-Attention version detection: Completely independent version detection that doesn't rely on model_management module
  • FA-2/FA-3 type detection: Automatically detects Flash-Attention 2 or 3 based on version number
  • SageAttention version detection: Independent SageAttention version detection with CUDA/PyTorch information
  • Version information logging: Version information is logged on every generation

Flash-Attention Auto-Load

  • No CLI options required: Flash-Attention is automatically loaded when SageAttention is set to disabled without requiring --use-flash-attention CLI option
  • Direct loading: Uses optimized_attention_override to directly load Flash-Attention
  • Package import detection: Detects Flash-Attention availability based on package import capability, not CLI options

Dynamic Patching and Logging

  • ON_PRE_RUN callback: Loads Flash-Attention when disabled, applies SageAttention when enabled, and outputs version logs
  • ON_CLEANUP callback: Always outputs Flash-Attention logs even when SageAttention was active (ComfyUI resets to optimal kernel on cleanup)
  • Per-generation logging: Version information is logged on every generation

Technical Details

Core Mechanism

The implementation uses ON_PRE_RUN and ON_CLEANUP callbacks for dynamic patching and logging:

  1. Generation start (ON_PRE_RUN callback):

    • If SageAttention is enabled: Get SageAttention function, set optimized_attention_override to apply patch, output SageAttention version log
    • If SageAttention is disabled: If Flash-Attention package is available, set comfy_attention.attention_flash to optimized_attention_override to directly load Flash-Attention, output Flash-Attention version log
  2. Generation end (ON_CLEANUP callback):

    • Delete optimized_attention_override to execute ComfyUI's kernel reset
    • ComfyUI automatically selects optimal kernel (Flash-Attention) as initial state
    • Output Flash-Attention version log at this point (even when SageAttention was active)

Flash-Attention Version Detection Function

def get_flash_attention_info():
    """
    Get Flash-Attention version and type information.
    Returns: (is_available, version, type)
    """

Process flow:

  1. Check package import capability using import flash_attn
  2. Attempt to get version string from flash_attn.__version__
  3. Fallback to importlib.metadata.version("flash-attn") if version doesn't exist
  4. Split version string and get major version number
  5. Determine FA-2 or FA-3: major_version >= 3 → "FA-3", otherwise → "FA-2"
  6. Return tuple: (is_available, version, type)

Important point: Determines based on package import capability only, regardless of args.use_flash_attention value. This allows accurate detection of actual Flash-Attention availability.

SageAttention Version Detection Function

def get_sage_attention_info():
    """
    Get SageAttention version information.
    Returns: (version, cuda_version, torch_version)
    """

Process flow:

  1. Import sageattention package
  2. Attempt to get version from sageattention.__version__
  3. Fallback to importlib.metadata.version("sageattention") if version doesn't exist
  4. Get CUDA version from torch.version.cuda
  5. Get PyTorch version from torch.version.__version__
  6. Return tuple: (version, cuda_version, torch_version)

Flash-Attention Auto-Load Mechanism

When SageAttention is set to disabled, Flash-Attention is loaded using optimized_attention_override:

def attention_override_flash(func, *args, **kwargs):
    return comfy_attention.attention_flash(*args, **kwargs)
model.model_options["transformer_options"]["optimized_attention_override"] = attention_override_flash

This bypasses ComfyUI's default attention selection logic and directly uses the attention_flash function.

Important Points

1. Complete Independence

Completely independent from model_management module. This allows version information to always be retrieved regardless of ComfyUI updates.

2. No CLI Options Required

Flash-Attention can be loaded using only the node's disabled setting, without requiring --use-flash-attention CLI option. Determines based on package import capability only.

3. Per-Generation Logging

  • ON_PRE_RUN (before generation): Outputs SageAttention log if enabled, Flash-Attention log if disabled
  • ON_CLEANUP (after generation): Always checks and outputs Flash-Attention state (even when SageAttention is enabled, ComfyUI's kernel reset causes Flash-Attention to be selected as initial state)

4. Error Handling

All critical operations are wrapped in try-except blocks to ensure safe operation even when errors occur.

5. Kernel Reset Understanding

ComfyUI performs kernel reset on every generation, returning to initial state (optimal kernel). This initial state automatically selects Flash-Attention when available. Therefore, even when SageAttention is enabled, Flash-Attention logs are output during cleanup after generation ends.

Benefits

  • No manual modifications needed: Eliminates need to manually modify model_management.py after ComfyUI updates
  • Always available version info: Version information can always be retrieved regardless of ComfyUI updates
  • Convenient Flash-Attention loading: Flash-Attention can be loaded without CLI options
  • Better visibility: Version information is logged on every generation, providing better visibility into attention mechanism state

Files Modified

  • nodes/sa.py: Added independent version detection functions and Flash-Attention auto-load functionality

Summary

This implementation adds completely independent version detection and Flash-Attention auto-load features to the ComfyUI-DistorchMemoryManager node. By completely eliminating dependencies on model_management, version information for Flash-Attention and SageAttention can always be retrieved regardless of ComfyUI updates. Additionally, Flash-Attention (FA-2/FA-3) is automatically loaded when disabled, operating without CLI options. This eliminates the hassle of manually modifying model_management.py after each ComfyUI update.

v2.2.0 - SageAttention Patch Feature

10 Jan 09:59

Choose a tag to compare

Overview

Added Patch Sage Attention DM node for patching ComfyUI's attention mechanism to use SageAttention. This feature allows replacing ComfyUI's standard attention mechanism with SageAttention, providing improved memory efficiency and performance.

Key Features

Patch Sage Attention DM Node

  • Dynamic patching: Uses ComfyUI's callback system (ON_PRE_RUN and ON_CLEANUP) for dynamic attention patching
  • Multiple SageAttention modes: Supports auto, CUDA, Triton, and SageAttention 3 implementations
  • Version detection: Automatically detects and logs SageAttention version with CUDA/PyTorch information
  • Flash-Attention state logging: Logs Flash-Attention state when SageAttention is disabled
  • ComfyUI compatibility: Compatible with ComfyUI's attention function format using wrap_attn decorator

Supported SageAttention Modes

  • disabled: Disable SageAttention and restore original attention mechanism
  • auto: Automatic SageAttention implementation selection
  • sageattn_qk_int8_pv_fp16_cuda: CUDA implementation (QK int8, PV FP16)
  • sageattn_qk_int8_pv_fp16_triton: Triton implementation (QK int8, PV FP16)
  • sageattn_qk_int8_pv_fp8_cuda: CUDA implementation (QK int8, PV FP8)
  • sageattn_qk_int8_pv_fp8_cuda++: CUDA implementation (QK int8, PV FP8, optimized)
  • sageattn3: SageAttention 3 implementation (Blackwell support)
  • sageattn3_per_block_mean: SageAttention 3 implementation (per-block mean version)

Version Detection and Logging

  • SageAttention version: Detects version using version attribute or importlib.metadata
  • CUDA/PyTorch versions: Includes CUDA and PyTorch version information in logs
  • Flash-Attention state: Checks and logs Flash-Attention state from model_management module
  • Dynamic logging: Logs are output on every model execution via callbacks

Technical Details

Implementation Architecture

The implementation uses ComfyUI's callback system to dynamically patch attention:

  • ON_PRE_RUN callback: Patches attention before each model execution
  • ON_CLEANUP callback: Cleans up and logs Flash-Attention state after each execution

Attention Function Wrapping

The implementation uses wrap_attn decorator to wrap SageAttention functions in ComfyUI's attention format:

  • Converts tensor shapes from ComfyUI format (q, k, v, heads) to SageAttention format
  • Handles FP32 to FP16 conversion (SageAttention primarily operates on FP16)
  • Adjusts mask dimensions (adds batch and heads dimensions)
  • Restores output to original data type and shape

torch.compile Control

  • allow_compile option: Optional boolean parameter to enable torch.compile
  • Default disabled: torch.compile is disabled by default to avoid compilation overhead
  • Requirement: torch.compile requires sageattn 2.2.0 or higher

Error Handling

All critical operations are wrapped in try-except blocks:

  • Version detection failures are handled gracefully
  • Flash-Attention state checks are safe
  • Fallback messages are provided when information cannot be retrieved

Usage

  1. Add node in ComfyUI: Add "Patch Sage Attention DM" node from "Memory" category
  2. Connect model: Connect model from CheckpointLoader or similar
  3. Select SageAttention mode: Choose desired mode from dropdown
  4. Configure options: Enable allow_compile if desired (requires sageattn 2.2.0+)
  5. Execute: Model execution will output logs to console

Log Output Examples

When SageAttention is Enabled

Patching comfy attention to use SageAttention 2.2.0+cu121torch2.3.0

When SageAttention is Disabled (Flash-Attention Enabled)

Restoring initial comfy attention
[ComfyUI] Using FA-3 (Flash-Attention 3.0.0) direct

When SageAttention is Disabled (Flash-Attention Disabled)

Restoring initial comfy attention

Important Notes

  1. Patching occurs on every execution: Due to callback usage, SageAttention is applied before each execution and cleaned up after
  2. Use 'disabled' to restore: To disable SageAttention, run the node again with sage_attention set to disabled
  3. Logs output on every execution: Console output increases as logs are output on every model execution
  4. allow_compile requirement: torch.compile requires sageattn 2.2.0 or higher

Files Modified

  • nodes/sa.py: Added PatchSageAttentionDM class with SageAttention patching functionality

Compatibility

This implementation replicates the functionality of comfyui-kjnodes' PatchSageAttentionKJ node:

  • Supports the same SageAttention modes
  • Uses the same log format
  • Uses the same callback mechanism

v2.0.0 - Qwen3-VL and Nunchaku Model Purging Support

01 Jan 12:28

Choose a tag to compare

ComfyUI-VRAM-Manager Modification Notes

Purpose of Modifications

Added purge functionality for Qwen3-VL models loaded by qwen3-vl-comfy-ui and Nunchaku models (FLUX/Z-Image/Qwen-Image) loaded by ComfyUI-nunchaku to the purgevram2 node in ComfyUI-VRAM-Manager (formerly ComfyUI-DistorchMemoryManager). These models are not managed by ComfyUI's standard model_management, so dedicated purge processing was required. This was implemented to prevent OOM errors during upscale processing.

Modified Files

ComfyUI/custom_nodes/ComfyUI-DistorchMemoryManager/__init__.py
pyproject.toml

Overview of Modifications

  1. Added Qwen3-VL model purge functionality
  2. Added Nunchaku model purge functionality (with CPU offload support)
  3. Added detailed debug logging
  4. Enhanced processing to ensure reliable unloading from memory
  5. Fixed any() function name collision with AnyType
  6. Changed display name to ComfyUI-VRAM-Manager

1. Parameter Addition to INPUT_TYPES

Added the following two boolean parameters to the INPUT_TYPES of the DisTorchPurgeVRAMV2 class:

"purge_qwen3vl_models": ("BOOLEAN", {"default": False, "tooltip": "Clear Qwen3-VL models from GPU memory"}),
"purge_nunchaku_models": ("BOOLEAN", {"default": False, "tooltip": "Clear Nunchaku models (FLUX/Z-Image/Qwen-Image) from GPU memory"}),

This allows users to individually purge Qwen3-VL models and Nunchaku models using the purgevram2 node.

2. Qwen3-VL Model Purge Functionality

2.1 Model Type Import

Dynamically imports Qwen3VLForConditionalGeneration from the transformers library. If the import fails, the purge process is skipped.

from transformers import Qwen3VLForConditionalGeneration
qwen3vl_model_type = Qwen3VLForConditionalGeneration

2.2 Method 1: sys.modules Search

Iterates through all modules in sys.modules and searches for Qwen3-VL models stored as module attributes. When a model is found, the following processing is executed:

2.2.1 hf_device_map Processing

Qwen3-VL models may be loaded with device_map="auto", which can result in the model being distributed across multiple devices. The hf_device_map dictionary stores parameter names and their device information.

if hasattr(attr, 'hf_device_map'):
    hf_device_map = attr.hf_device_map
    for param_name, device in hf_device_map.items():
        device_str = str(device) if device is not None else ''
        if device_str.startswith('cuda') or (isinstance(device, int) and device >= 0):
            # Get module path and move to CPU
            submodule = attr
            if param_name:
                for part in param_name.split('.'):
                    submodule = getattr(submodule, part)
            if hasattr(submodule, 'to'):
                submodule.to('cpu')

The device can be either a string (e.g., 'cuda:0') or an integer (device ID), so both formats are supported.

2.2.2 Model CPU Transfer

Moves the entire model to CPU. If .to('cpu') fails, a fallback process is implemented to move parameters individually to CPU.

if hasattr(attr, 'to'):
    attr.to('cpu')
elif hasattr(attr, 'cpu'):
    attr.cpu()

2.2.3 Internal State Clearing

To ensure reliable unloading from memory, parameters and buffers are explicitly moved to CPU, and the _modules dictionary is cleared.

if hasattr(attr, 'named_parameters'):
    for name, param in list(attr.named_parameters()):
        if param is not None and hasattr(param, 'data'):
            if param.data is not None:
                param.data = param.data.detach().cpu()
if hasattr(attr, 'named_buffers'):
    for name, buffer in list(attr.named_buffers()):
        if buffer is not None and hasattr(buffer, 'data'):
            if buffer.data is not None:
                buffer.data = buffer.data.detach().cpu()
if hasattr(attr, '_modules'):
    attr._modules.clear()

Using detach().cpu() detaches from the computation graph before moving to CPU, ensuring references are reliably released.

2.2.4 Reference Deletion

Deletes the model reference from module attributes and also deletes the object itself.

if hasattr(module, attr_name):
    delattr(module, attr_name)
del attr

2.3 Method 2: gc.get_objects() Search

Searches for models not found in sys.modules from all objects tracked by the garbage collector. The same processing as Method 1 is executed for found models.

for obj in gc.get_objects():
    if isinstance(obj, qwen3vl_model_type):
        # hf_device_map processing, CPU transfer, internal state clearing, deletion

2.4 Garbage Collection and CUDA Cache Clearing

After model purging, garbage collection is executed twice, and caches for all CUDA devices are cleared.

gc.collect()
gc.collect()  # Execute twice to ensure cleanup
for device_idx in range(torch.cuda.device_count()):
    with torch.cuda.device(device_idx):
        torch.cuda.empty_cache()
        torch.cuda.ipc_collect()
torch.cuda.synchronize()

3. Nunchaku Model Purge Functionality

3.1 Model Type Import

Nunchaku models have multiple types, so each is imported individually.

  1. NunchakuFluxTransformer2dModel (for FLUX)
  2. NunchakuZImageTransformer2DModel (for Z-Image)
  3. NunchakuT5EncoderModel (for text encoder, import failure is acceptable)
  4. NunchakuQwenImageTransformer2DModel (for Qwen-Image, searched from multiple paths)
from nunchaku import NunchakuFluxTransformer2dModel
from nunchaku.models.transformers.transformer_zimage import NunchakuZImageTransformer2DModel
from comfyui_nunchaku.models.qwenimage import NunchakuQwenImageTransformer2DModel

3.2 CPU Offload Disabling

Nunchaku models may use CPU offload functionality. If offload is enabled, it is disabled using set_offload(False), and offload_manager is set to None to ensure offloaded memory is reliably released.

if hasattr(attr, 'set_offload'):
    if hasattr(attr, 'offload') and attr.offload:
        attr.set_offload(False)
if hasattr(attr, 'offload_manager') and attr.offload_manager is not None:
    attr.offload_manager = None

3.3 Method 1: sys.modules Search

Iterates through all modules in sys.modules and searches for Nunchaku models. Models may be found in the following structures:

3.3.1 Direct Reference

When the model is stored directly as a module attribute.

if isinstance(attr, model_type):
    # Disable CPU offload, move to CPU, clear internal state, delete

3.3.2 Dictionary (transformer key)

When the model is stored in the transformer key of a dictionary.

if isinstance(attr, dict) and 'transformer' in attr:
    transformer_obj = attr.get('transformer')
    if isinstance(transformer_obj, model_type):
        # Similar processing

3.3.3 Nested Structure (ModelPatcher structure)

For nested structures like ComfyFluxWrapper, the model is searched by following the path dict.model.diffusion_model.model.

if 'model' in attr:
    model_obj = attr.get('model')
    if hasattr(model_obj, 'diffusion_model'):
        diffusion_model = model_obj.diffusion_model
        if hasattr(diffusion_model, 'model'):
            transformer_obj = diffusion_model.model
            if isinstance(transformer_obj, model_type):
                # Similar processing

3.4 Method 2: ComfyUI model_management Search

Iterates through ComfyUI's current_loaded_models and searches for Nunchaku models within the ModelPatcher structure.

for loaded_model in current_loaded_models:
    if hasattr(loaded_model, "model"):
        model = loaded_model.model
        if hasattr(model, "diffusion_model"):
            diffusion_model = model.diffusion_model
            if hasattr(diffusion_model, "model"):
                transformer = diffusion_model.model
                if isinstance(transformer, model_type):
                    # Disable CPU offload, move to CPU, clear internal state
                    loaded_model.currently_used = False
                    loaded_model.model_unload()

Calling model_unload() ensures the model is reliably unloaded from ComfyUI's management.

3.5 Method 3: gc.get_objects() Search

Searches for models not found in Method 1 and Method 2 from all objects tracked by the garbage collector.

for obj in gc.get_objects():
    for model_type in nunchaku_model_types:
        if isinstance(obj, model_type):
            # Disable CPU offload, move to CPU, clear internal state

4. Debug Logging Addition

Added detailed debug logging to the Qwen3-VL and Nunchaku purge processes. The following types of logs are output at each processing step:

4.1 Start Message

Explicitly indicates the start of the purge process.

print("Qwen3-VL: Starting purge process...")
print("Nunchaku: Starting purge process...")

4.2 Import Results

Records import success/failure for each model type.

print("Qwen3-VL: Successfully imported Qwen3VLForConditionalGeneration")
print("Nunchaku: Successfully imported NunchakuFluxTransformer2dModel")
print(f"Nunchaku: Failed to import NunchakuT5EncoderModel: {e}")

4.3 Method Start/Completion Messages

Explicitly indicates the start and completion of each Method, and records the number of objects checked and models detected.

print("Qwen3-VL: Method 1 - Searching sys.modules for models...")
print(f"Qwen3-VL: Method 1 complete - checked {modules_checked} modules")
print(f"Qwen3-VL: Method 2 complete - checked {objects_checked} objects, found {models_found_in_gc} models")

4.4 Detailed Information When Models Are Detected

When a model is found, records the type name, ID, and location.

pr...
Read more

v1.3.1 - Improved SeedVR2 Cache Detection and Messaging

11 Dec 22:40

Choose a tag to compare

Release Notes v1.3.1

Improved SeedVR2 Cache Detection and Messaging

This release focuses on improving user experience when working with SeedVR2 models, providing clearer feedback about cache state and better debugging information.

Key Improvements

1. Enhanced Debug Information

  • Import Method Tracking: Now displays which method was used to access SeedVR2's GlobalModelCache (Method 1: direct import, Method 2: via seedvr2_videoupscaler, Method 3: via sys.modules)
  • Detailed Cache State: Shows exact counts of DiT models, VAE models, and runner templates in the cache
  • Attribute Verification: Indicates whether cache dictionaries exist but are empty, helping users understand the cache state

2. Clearer User Messages

  • Explicit Explanation: Messages now clearly explain that cache_model=False (default) means models are never cached in GlobalModelCache
  • Actionable Guidance: Provides instructions on how to enable caching (cache_model=True in SeedVR2 nodes)
  • Technical Accuracy: Correctly describes SeedVR2's automatic cleanup behavior after processing completes

3. Removed Duplicate Messages

  • Eliminated redundant console output that was confusing users
  • Streamlined message flow for better readability

Technical Details

SeedVR2 Caching Behavior

cache_model=False (Default):

  • Models are never added to GlobalModelCache
  • After processing completes, models are automatically deleted from memory
  • This is expected behavior, not a bug

cache_model=True:

  • Models are stored in GlobalModelCache._dit_models and _vae_models
  • Models remain in memory after processing
  • Can be cleared via Purge VRAM V2's purge_seedvr2_models option

Example Output

SeedVR2: Cache accessed via Method 1 (direct import)
SeedVR2: Checking cache (DiT: 0, VAE: 0, Runners: 0)
SeedVR2: DiT models dictionary exists but is empty
SeedVR2: VAE models dictionary exists but is empty
SeedVR2: Cache is empty - cache_model option is disabled (False by default). Enable cache_model=True in SeedVR2 nodes to cache models in GlobalModelCache.

Impact

  • Better User Understanding: Users now know why the cache is empty and what they can do about it
  • Easier Debugging: Import method and cache state information help troubleshoot integration issues
  • Reduced Confusion: Eliminates misleading messages that suggested models were "already cleared"

Files Modified

  • __init__.py: Enhanced DisTorchPurgeVRAMV2.purge_vram() method's SeedVR2 handling section

v1.3.0

11 Dec 03:48

Choose a tag to compare

Release Notes v1.3.0

Overview

Version 1.3.0 introduces SeedVR2 model purging support to the DisTorchPurgeVRAMV2 node, along with critical bug fixes for cleanup_models() errors and CPU device handling. This release significantly improves memory management compatibility with SeedVR2 workflows and resolves stability issues.

New Features

SeedVR2 Model Purging Support

Added comprehensive support for purging SeedVR2's DiT (base) and VAE models from the GlobalModelCache.

Purpose

The SeedVR2 custom node uses an independent model caching system (GlobalModelCache) that stores DiT and VAE models separately from ComfyUI's standard model management. This release adds the ability to clear these cached models through the DisTorchPurgeVRAMV2 node.

Implementation Details

New Option Added:

  • purge_seedvr2_models: Boolean option (default: False) in DisTorchPurgeVRAMV2 node
  • When enabled, clears all cached SeedVR2 DiT and VAE models from GlobalModelCache

Path Detection:
The implementation uses multiple methods to locate the SeedVR2 custom node, ensuring compatibility across different user environments:

  1. Method 1: Import from already loaded modules (most reliable)
  2. Method 2: Relative path from current file (same custom_nodes directory)
  3. Method 3: Search in sys.path for seedvr2_videoupscaler
  4. Method 4: Parse path structure to find custom_nodes directory

This multi-method approach ensures the feature works regardless of where ComfyUI is installed on the user's system.

Model Clearing Process:

  • Accesses SeedVR2's GlobalModelCache via get_global_cache()
  • Iterates through _dit_models dictionary and removes each DiT model using remove_dit()
  • Iterates through _vae_models dictionary and removes each VAE model using remove_vae()
  • Clears _runner_templates dictionary
  • Properly releases model memory using SeedVR2's release_model_memory() function

Error Handling:

  • Gracefully handles cases where SeedVR2 is not installed
  • Does not display errors if SeedVR2 custom node is not found (normal behavior)
  • Catches and reports errors for individual model removal failures

Usage

Enable purge_seedvr2_models: True in the DisTorchPurgeVRAMV2 node to clear all cached SeedVR2 models. This is particularly useful when:

  • Switching between different SeedVR2 models
  • Experiencing memory issues with SeedVR2 workflows
  • Need to force reload SeedVR2 models

Bug Fixes

Fixed 'NoneType' object is not callable Error in cleanup_models()

Problem

The cleanup_models() function in ComfyUI's model_management module would fail with 'NoneType' object is not callable errors when attempting to call real_model() on models where real_model was None or not callable.

Solution

Implemented pre-cleanup logic that removes problematic models before calling cleanup_models():

Pre-cleanup Checks:

  1. Checks if real_model is None - removes model immediately
  2. Checks if real_model is not callable - removes model immediately
  3. Attempts to call real_model() - if it fails or returns None, removes model

Implementation Details:

  • Pre-cleanup is performed before both cleanup_models() calls:
    • Before the initial cleanup (after marking models as unused)
    • Before the second cleanup (after model unloading)
  • Iterates from end to start to prevent index shifting during removal
  • Logs the number of pre-cleaned models for debugging

Code Location:

  • Lines 227-262: First pre-cleanup before initial cleanup_models() call
  • Lines 312-337: Second pre-cleanup before second cleanup_models() call

This fix prevents the error from occurring and ensures stable model cleanup operations.

Fixed CPU Device Error in Virtual Memory Reset

Problem

Calling comfy.model_management.free_memory(0, 'cpu') would cause an error: Expected a cuda device, but got: cpu. This affected MemoryManager, SafeMemoryManager, and MemoryCleaner nodes.

Solution

Removed CPU device calls from virtual memory reset operations. The fix:

  • Only calls free_memory(0, 'cuda:0') when CUDA is available
  • Skips CPU device calls entirely
  • Wraps CUDA calls in try-except for additional safety

Modified Nodes:

  • MemoryManager: Removed CPU free_memory() call
  • SafeMemoryManager: Removed CPU free_memory() call
  • MemoryCleaner: Removed CPU free_memory() call

This ensures virtual memory reset operations complete without errors.

Technical Improvements

Improved Path Detection for SeedVR2

The SeedVR2 path detection system was redesigned to be user-environment independent:

Key Improvements:

  • No hardcoded absolute paths
  • Uses relative path detection from current file location
  • Leverages Python's module import system
  • Parses path structure dynamically

Benefits:

  • Works regardless of where ComfyUI is installed
  • Compatible with different operating systems
  • Handles various directory structures
  • More robust and maintainable

Summary

Version 1.3.0 enhances the DisTorchPurgeVRAMV2 node with SeedVR2 support and fixes critical stability issues. The release improves memory management for SeedVR2 workflows and resolves errors that could occur during model cleanup operations.

Key Benefits:

  • Clear SeedVR2 cached models through DisTorchPurgeVRAMV2 node
  • Eliminated 'NoneType' object is not callable errors
  • Fixed CPU device errors in virtual memory reset
  • Improved compatibility across different user environments
  • More stable and reliable memory management

v1.2.0

06 Dec 04:50

Choose a tag to compare

Release Notes v1.2.0

Overview

Version 1.2.0 introduces the Model Patch Memory Cleaner node, a dedicated memory management solution for ModelPatchLoader model patches. This release also includes significant enhancements to the DisTorchPurgeVRAMV2 node with more aggressive model unloading capabilities and improved error handling.

New Features

Model Patch Memory Cleaner Node

A new dedicated node for clearing model patches loaded via ModelPatchLoader to prevent OOM (Out of Memory) errors during upscaling operations.

Purpose

The ModelPatchMemoryCleaner node is designed to explicitly clear model patches (such as Z-Image ControlNet, QwenImage BlockWise ControlNet, SigLIP MultiFeat Proj) loaded via ModelPatchLoader from VRAM. This prevents OOM errors during upscaling operations by freeing memory occupied by unused model patches.

Problem Background

Model patches loaded via ModelPatchLoader are managed differently from standard models in ComfyUI's memory system. These patches (stored in ModelPatcher's additional_models or attachments) can remain in VRAM even after use, causing OOM errors during subsequent operations like upscaling. Existing memory cleaning nodes cannot properly detect and clear these model patches, necessitating a dedicated solution.

Implementation Details

File Created/Modified:

  • ComfyUI/custom_nodes/ComfyUI-DistorchMemoryManager/__init__.py

Complete Code:

class ModelPatchMemoryCleaner:
    """
    Memory cleaner specifically for ModelPatcher loaded model patches.
    Clears model patches loaded via ModelPatchLoader to prevent OOM during upscaling.
    """

    @classmethod
    def INPUT_TYPES(cls):
        return {
            "required": {
                "anything": (any, {}),
                "clear_model_patches": ("BOOLEAN", {"default": True, "tooltip": "Clear model patches loaded via ModelPatchLoader"}),
                "clean_gpu": ("BOOLEAN", {"default": True}),
                "force_gc": ("BOOLEAN", {"default": True}),
            }
        }

    RETURN_TYPES = (any,)
    RETURN_NAMES = ("any",)
    FUNCTION = "clear_model_patches"
    CATEGORY = "Memory"

    def clear_model_patches(self, anything, clear_model_patches, clean_gpu, force_gc):
        try:
            if clear_model_patches:
                import comfy.model_management
                import comfy.model_patcher
                
                # Get current loaded models
                if hasattr(comfy.model_management, "current_loaded_models"):
                    current_loaded_models = comfy.model_management.current_loaded_models
                    
                    # Find and unload model patches
                    unloaded_count = 0
                    for i in range(len(current_loaded_models) - 1, -1, -1):
                        loaded_model = current_loaded_models[i]
                        if loaded_model is not None and hasattr(loaded_model, "model"):
                            model = loaded_model.model
                            # Check if this is a ModelPatcher with additional_models (model patches stored here)
                            if isinstance(model, comfy.model_patcher.ModelPatcher):
                                # Check for additional_models (model patches stored here)
                                if hasattr(model, "additional_models") and model.additional_models:
                                    # Mark as not currently used
                                    loaded_model.currently_used = False
                                    # Unload the model
                                    if hasattr(loaded_model, "model_unload"):
                                        loaded_model.model_unload()
                                    # Remove from current_loaded_models
                                    current_loaded_models.pop(i)
                                    unloaded_count += 1
                                    print(f"Unloaded model patch: {type(model.model).__name__ if hasattr(model, 'model') else 'ModelPatcher'}")
                                # Also check attachments for model patches
                                elif hasattr(model, "attachments") and model.attachments:
                                    # Mark as not currently used
                                    loaded_model.currently_used = False
                                    # Unload the model
                                    if hasattr(loaded_model, "model_unload"):
                                        loaded_model.model_unload()
                                    # Remove from current_loaded_models
                                    current_loaded_models.pop(i)
                                    unloaded_count += 1
                                    print(f"Unloaded model patch from attachments: {type(model.model).__name__ if hasattr(model, 'model') else 'ModelPatcher'}")
                    
                    if unloaded_count > 0:
                        print(f"Cleared {unloaded_count} model patch(es)")
                    
                    # Cleanup models GC
                    if hasattr(comfy.model_management, "cleanup_models_gc"):
                        comfy.model_management.cleanup_models_gc()
            
            if clean_gpu and torch.cuda.is_available():
                torch.cuda.empty_cache()
                torch.cuda.synchronize()
                print("GPU memory cleared")
            
            if force_gc:
                gc.collect()
                print("Garbage collection completed")
            
            print("Model patch memory cleanup completed")
            
        except Exception as e:
            print(f"Model patch memory cleanup error: {e}")
        
        return (anything,)

Code Explanation

1. Class Definition and Documentation

The ModelPatchMemoryCleaner class is a dedicated memory cleaning node for model patches loaded via ModelPatcher. It was created to prevent OOM errors during upscaling operations.

2. INPUT_TYPES Method (Node Input Definition)

The INPUT_TYPES method defines the node's input parameters in ComfyUI:

  • anything: AnyType input that accepts any data type and passes it through to the output. This is a passthrough input for data flow in ComfyUI workflows.
  • clear_model_patches: Boolean value (default: True). Controls whether to clear model patches loaded via ModelPatchLoader. When True, detects and unloads model patches.
  • clean_gpu: Boolean value (default: True). Controls whether to clear GPU memory. When True, executes torch.cuda.empty_cache() and torch.cuda.synchronize().
  • force_gc: Boolean value (default: True). Controls whether to force garbage collection. When True, executes gc.collect().

3. RETURN_TYPES and RETURN_NAMES (Node Output Definition)

  • RETURN_TYPES: Defines the node's output type. Returns one any type.
  • RETURN_NAMES: Defines the output name. Output is named "any".
  • FUNCTION: Specifies the method name to execute. The clear_model_patches method is called.
  • CATEGORY: Node category. The node appears in the "Memory" category in ComfyUI's node menu.

4. clear_model_patches Method (Main Processing)

The main processing method that accepts four parameters.

4.1. Model Patch Clearing Process

When clear_model_patches is True, the model patch clearing process is executed:

  • Imports comfy.model_management and comfy.model_patcher, which are ComfyUI core modules providing model management and ModelPatcher functionality.
  • current_loaded_models is a list of models currently loaded in memory, managed by ComfyUI's model_management module.

4.2. Model Patch Detection and Unloading

The code iterates through the list from back to front to prevent index shifting when removing elements:

  • Checks if each loaded_model is not None and has a model attribute.
  • Verifies if the model is a ModelPatcher instance. Model patches loaded via ModelPatchLoader are wrapped in ModelPatcher.
  • Checks the additional_models attribute. ModelPatcher stores additional models (model patches) in the additional_models dictionary. If this dictionary is not empty, it means model patches are loaded.
  • For model patches found:
    • Sets currently_used to False, marking the model as "not in use" in ComfyUI's memory management system.
    • Calls model_unload() to unload the model, moving it from VRAM to CPU memory or disk.
    • Removes the model from the current_loaded_models list using pop(i).
    • Increments unloaded_count to track the number of unloaded model patches.
    • Prints the type name of the unloaded model patch for debugging.
  • Also checks the attachments attribute, as ModelPatcher may store model patches in attachments as well.

4.3. Cleanup Process

  • Prints the number of unloaded model patches.
  • Calls cleanup_models_gc() to perform garbage collection, cleaning up references to deleted models and preventing memory leaks.

4.4. GPU Memory Clearing

When clean_gpu is True and CUDA is available:

  • torch.cuda.empty_cache(): Clears PyTorch's CUDA cache, freeing unused GPU memory.
  • torch.cuda.synchronize(): Waits for CUDA operations to complete, ensuring memory clearing is fully completed.

4.5. Garbage Collection

When force_gc is True:

  • gc.collect(): Executes Python's garbage collector, reclaiming unused objects including circular references.

4.6. Error Handling

All processing is wrapped in a try-except block to prevent node crashes even if errors occur:

  • If an error occurs, an error message is printed.
  • Finally, returns the anything input as-is to the output, allowing data to continue flowing through the workflow.

**5. Node Registrat...

Read more