Releases: ussoewwin/ComfyUI-DistorchMemoryManager
RES4LYF Flash Attention Fix — Full Technical Note
This document records what failed, why, the underlying issue, what changed, and what each change means for the fix applied to sd/attention.py shipped with the custom node RES4LYF.
1. Summary
- Symptom: Logs showed
Flash Attention failed, using default SDPA:followed repeatedly by PyTorch internals such as
schema_.has_value() INTERNAL ASSERT FAILED(nearATen/core/dispatch/OperatorEntry.h). - Typical environment: PyTorch 2.11 + CUDA 13.x (e.g.
2.11.0+cu130), ComfyUI with--use-flash-attention, workflows that use attention code paths through RES4LYF. - Fix: In RES4LYF’s
attention_flashimplementation, remove the extratorch.library.custom_opwrapper and remove the SDPA fallback; callflash_attn_funcdirectly instead.
2. What the error looked like (log output)
A typical sequence was:
-
Warning
Flash Attention failed, using default SDPA: ... -
PyTorch internal assert (example)
schema_.has_value() INTERNAL ASSERT FAILED at "...OperatorEntry.h":84
Tried to access the schema for which doesn't have a schema registered yet
please report a bug to PyTorch.
Exact wording can vary by build/version, but the nature is the same: dispatcher / operator schema resolution failure.
3. Cause (what the code was doing)
Before the fix, RES4LYF attention.py roughly had a two-stage structure.
3.1 Stage 1: Another custom op layered on top of flash_attn_func
Following the same pattern as ComfyUI core, it wrapped flash-attn’s flash_attn_func like this:
@torch.library.custom_op("flash_attention::flash_attn", mutates_args=())
def flash_attn_wrapper(q, k, v, dropout_p=0.0, causal=False):
return flash_attn_func(q, k, v, dropout_p=dropout_p, causal=causal)That registers a new PyTorch custom op name: flash_attention::flash_attn.
Meanwhile, the flash-attn package itself (see flash_attn_interface.py) uses torch.library.custom_op and related APIs for PyTorch 2.4+ integration.
So the same computation was routed through two custom-op layers: flash-attn’s internal registration plus the RES4/ComfyUI-style alias flash_attention::flash_attn.
3.2 Stage 2: Fallback to torch.nn.functional.scaled_dot_product_attention on exception
try:
out = flash_attn_wrapper(...)
except Exception as e:
logging.warning(f"Flash Attention failed, using default SDPA: {e}")
out = torch.nn.functional.scaled_dot_product_attention(...)If stage 1 raised or hit dispatcher inconsistency, execution fell through to SDPA after the warning. That path could hit dispatcher issues of the same family, which is why logs looked like warning + internal assert in succession.
4. The underlying issue (design level)
| Aspect | Issue |
|---|---|
| Duplicate abstraction | flash_attn_func is already the supported entry point for integrating flash-attn with PyTorch. Stacking another ComfyUI-style custom op on top is prone to dispatcher clashes on some versions. |
| Fallback target | SDPA also goes through PyTorch’s attention backends and can trigger another failure on the same dispatcher, not “a safe escape hatch.” |
| Confusion with “bad build” | Internal PyTorch paths in the log push people toward ABI / wheel theories; in this incident the main driver was how this node’s bundled attention code called flash-attn. |
The core problem: deviation from the intended API (flash_attn_func directly) plus routing failures into SDPA.
5. Why it didn’t show on PyTorch 2.10 but did on 2.11
- With the same RES4 code, differences in
OperatorEntry/ schema resolution mean 2.10 often did not reach the same internal asserts (or behaved differently on the same path). - 2.11 changed dispatcher behavior, so the double wrapper + fallback combination surfaced there.
This does not mean “2.10 was correct”—it means latent risk that didn’t surface on 2.10.
6. File changed
| Field | Value |
|---|---|
| Path | ComfyUI/custom_nodes/RES4LYF/sd/attention.py |
| Function | attention_flash |
| Assumption | from flash_attn import flash_attn_func succeeds near the top of the file (when using --use-flash-attention). |
7. What changed (code details)
7.1 Removed
-
The entire
flash_attn_wrapperblock@torch.library.custom_op("flash_attention::flash_attn", ...)@flash_attn_wrapper.register_fake- The
AttributeErrorstubflash_attn_wrapper
-
The
try/exceptinsideattention_flash- Success path:
flash_attn_wrapper(q.transpose(1, 2), ...) - Failure path:
logging.warning(...)+torch.nn.functional.scaled_dot_product_attention(...)
- Success path:
7.2 Core of attention_flash after the fix (excerpt)
def attention_flash(q, k, v, heads, mask=None, attn_precision=None, skip_reshape=False, skip_output_reshape=False):
if skip_reshape:
b, _, _, dim_head = q.shape
else:
b, _, dim_head = q.shape
dim_head //= heads
q, k, v = map(
lambda t: t.view(b, -1, heads, dim_head).transpose(1, 2),
(q, k, v),
)
if mask is not None:
if mask.ndim == 2:
mask = mask.unsqueeze(0)
if mask.ndim == 3:
mask = mask.unsqueeze(1)
assert mask is None
# Match ComfyUI core: call flash_attn_func directly (avoid duplicate custom_op + broken SDPA fallback on torch 2.11+).
out = flash_attn_func(
q.transpose(1, 2),
k.transpose(1, 2),
v.transpose(1, 2),
dropout_p=0.0,
causal=False,
).transpose(1, 2)
if not skip_output_reshape:
out = (
out.transpose(1, 2).reshape(b, -1, heads * dim_head)
)
return out(The real file continues with optimized_attention wiring as before.)
8. Meaning of each change
| Change | Meaning |
|---|---|
Remove flash_attn_wrapper |
Stops registering flash_attention::flash_attn as a second custom op, avoiding dispatcher mismatches common on PyTorch 2.11.x. |
Call flash_attn_func directly |
Restores the supported API path; leave custom-op registration to the flash-attn package. |
Remove try / SDPA |
Avoids falling into the broken SDPA path on failure; exceptions propagate, breaking the “FA fails → SDPA hits a second failure” chain. |
assert mask is None |
Makes explicit that this path does not support masks (aligned with the old RuntimeError inside try). |
| Comment | Explains why we do not keep the ComfyUI-style wrapper. |
9. Relationship to ComfyUI core
- ComfyUI core
comfy/ldm/modules/attention.pymay still contain the same pattern (flash_attn_wrapper+ SDPA fallback) historically. - In this incident, logs pointed to RES4LYF’s
sd/attention.py, and symptoms cleared with only the RES4 fix—meaning that workflow actually used RES4’s attention implementation. - If you avoid touching core, other workflows that hit core’s same pattern could still show issues; mitigations include runtime patching from another custom node so updates don’t overwrite edits.
10. Operational notes
- Updating RES4LYF may overwrite
sd/attention.pyand remove this fix. Re-check diffs after upgrades using this note. --use-flash-attentionand an installedflash-attnremain prerequisites.- After the fix, if Flash Attention truly fails (e.g. OOM), it no longer auto-falls back to SDPA; you’ll see an exception instead. That trades “silent double failure” for easier debugging.
11. One-sentence summary
RES4LYF wrapped flash_attn_func in an extra custom_op and fell back to SDPA on failure, which conflicted with PyTorch 2.11’s dispatcher and produced internal asserts in logs; calling flash_attn_func directly and removing the SDPA fallback removes that underlying risk.
Document note: Based on facts at investigation/fix time. Behavior may vary with PyTorch / ComfyUI / RES4LYF versions.
ComfyUI-DistorchMemoryManager Patch Explanation
1. Why this fix was needed
In production usage, the following message was repeatedly emitted and became major log noise:
Error running sage attention: Unsupported head_dim: 160, using pytorch attention instead.
This is not a fatal stop condition. It is a known recoverable path where execution continues via PyTorch attention fallback.
The real operational issue was that the same recoverable case was logged as error over and over.
2. Root cause
When SageAttention is called with unsupported head_dim=160 combinations, it can raise an exception.
The implementation already has a fallback path, so generation can continue.
The problem was the logging policy: a known recoverable fallback was recorded as error every time.
3. Why this was fixed in the custom node
Direct patches to ComfyUI core files are overwritten by updates and are hard to maintain.
So this project uses the following policy:
- No dependency on direct core-file edits
- Apply runtime patching when the custom node is loaded
- Keep behavior stable across ComfyUI updates by reapplying from the custom node side
In short, this is an external runtime patch strategy optimized for long-term maintainability.
4. Files changed and what they mean
4.1 ComfyUI/custom_nodes/ComfyUI-DistorchMemoryManager/nodes/sa.py
Change
The exception handling in attention_sage() was adjusted so only Unsupported head_dim: 160 is suppressed.
- Known exception (
Unsupported head_dim: 160) logs once asinfo - Other exceptions remain
error(unchanged behavior) - Fallback target stays
attention_pytorch(unchanged behavior)
Meaning
This suppresses only known noise while preserving visibility of unknown failures.
4.2 ComfyUI/custom_nodes/ComfyUI-DistorchMemoryManager/__init__.py
Change (core of this patch)
At startup, _install_sage_attention_noise_guard() runs and replaces ComfyUI's attention_sage externally via runtime patching.
Implementation highlights:
- Imports
comfy.ldm.modules.attention - Gets the current
attention_sage - Defines wrapped
attention_sage_guardedwith@wrap_attnand swaps it in - Also swaps these references when needed:
optimized_attentionoptimized_attention_maskedREGISTERED_ATTENTION_FUNCTIONS["sage"]
- Prevents double patching via
_dm_sage160_guard - Limits suppression strictly to
"Unsupported head_dim: 160"
Meaning
Behavior is corrected at runtime without editing core files directly, making it resilient to ComfyUI updates.
5. Scope and safety
This suppression is narrowly scoped, not blanket suppression.
- Match target:
"Unsupported head_dim: 160" - Known case: first occurrence logged as
info - All other exceptions: still logged as
error
As a result:
- Reduces known SD1.5-style log spam
- Preserves observability for other models and unknown errors
6. Verification checklist
- After restarting ComfyUI, confirm this startup log:
[ComfyUI-DistorchMemoryManager] Installed external sage-attention head_dim=160 noise guard
- During SD1.5 runs, repeated
errorlogs forUnsupported head_dim: 160no longer appear - Fallback still works and generation continues
- SA/FA behavior for other models (e.g., SDXL) remains functional
- Non-known SageAttention exceptions are still visible as
error
7. Future improvements (developer ideas)
- Make suppression targets configurable (e.g., list of unsupported head dimensions)
- Add UI toggle to enable/disable suppression
- Explicitly document runtime patch policy in README/CHANGELOG
- Improve startup logs to show patch apply status in more detail
8. Summary
This patch suppresses only the known log spam caused by the head_dim=160 fallback path, while keeping unknown error visibility intact.
By implementing it as a custom-node runtime patch instead of direct core modification, it improves long-term maintainability.
v2.3.6 - Enhanced SageAttention3 (SA3) Integration
Enhanced SageAttention3 (SA3) Integration
This release enhances the Patch Sage Attention DM node with improved SageAttention3 (SA3) support, including version detection, constraint handling, and automatic fallback mechanisms.
Key Features
1. SA3 Version Detection
- Added dedicated
get_sage_attention3_info()function for SA3 version detection - Detects Blackwell GPU support (FP4 kernel availability)
- Returns version information, availability status, and Blackwell support flag
2. Improved Logging
- SA2 version logs are now skipped when SA3 modes are selected
- SA3-specific logging with version information (e.g., "SageAttention3 3.0.0.b1 (Blackwell FP4)")
- Clear distinction between SA2 and SA3 mode logging
3. Constraint Handling and Fallback
- Automatic fallback to PyTorch SDPA when SA3 constraints are not met:
headdim >= 256: SA3 FP4 kernel does not support head dimensions >= 256attn_mask != None: SA3 does not support attention masks
- Seamless fallback ensures compatibility with all model configurations
4. Tensor Layout Conversion
- Automatic conversion from ComfyUI's default NHD layout
[batch, seq_len, heads, dim]to SA3's expected HND layout[batch, heads, seq_len, dim] - Proper layout restoration after SA3 processing
5. Per-Block Mean Support
- Support for both
sageattn3(standard mode, accuracy-focused) andsageattn3_per_block_mean(fast mode, Triton-optimized) - Per-block mean processing uses 128-token blocks with Triton kernels for improved performance
Technical Details
Modified File: nodes/sa.py
Changes:
- Lines 102-125: Added
get_sage_attention3_info()function - Lines 151-152: Added SA2 log skip condition for SA3 modes
- Lines 184-188: Added SA3-specific version detection and logging
- Lines 195-200: Clarified tensor layout conversion
- Lines 202-210: Added fallback detection logic
- Lines 211-218: Improved fallback processing and SA3 call branching
Usage
-
Start ComfyUI with
--use-sage-attentionflag:python main.py --use-sage-attention
-
Add the Patch Sage Attention DM node to your workflow
-
Select SA3 mode from the
sage_attentiondropdown:sageattn3: Standard SA3 mode (accuracy-focused)sageattn3_per_block_mean: Fast SA3 mode (Triton-optimized)
Expected Log Output
When SA2 is selected:
Patching comfy attention to use SageAttention 2.1.1+cu128torch2.7
When SA3 is selected:
Patching comfy attention to use SageAttention3 3.0.0.b1 (Blackwell FP4)
Compatibility
- Requires
sageattn3package (v3.0.0.b1 or higher recommended) - Optimized for Blackwell architecture GPUs (RTX 5060 Ti 16GB and similar)
- Automatic fallback ensures compatibility with all model configurations
- Compatible with ComfyUI's attention function format via
wrap_attndecorator
Notes
- SA3 constraints (headdim < 256, no attention mask) are automatically handled with fallback to PyTorch SDPA
- The node dynamically patches attention on each model execution and automatically cleans up afterward
- Version information is logged on every generation for debugging and verification
v2.3.3 - Fix Node Import Paths (Issue #3)
Release Notes v2.3.3 - Fix Node Import Paths (Issue #3)
Overview
This release fixes Issue #3, which caused only one node (Model Patch Memory Cleaner) to be visible in ComfyUI after updating, despite the repository containing four nodes.
Problem Description
Symptom
After updating the ComfyUI-DistorchMemoryManager custom node, users reported that only one node (Model Patch Memory Cleaner) was visible in ComfyUI's node palette, even though the repository contains four nodes:
- Memory Manager
- Safe Memory Manager
- Purge VRAM V2 (DisTorchPurgeVRAMV2)
- Patch Sage Attention DM
Root Causes
The issue had two root causes:
1. Incorrect Import Paths
After refactoring the project structure to move node files into a nodes/ subdirectory, the import statements in __init__.py were not updated to reflect the new directory structure.
Before refactoring:
ComfyUI-DistorchMemoryManager/
├── __init__.py
├── memory_manager.py
├── purge_vram.py
└── sa.py
After refactoring:
ComfyUI-DistorchMemoryManager/
├── __init__.py
└── nodes/
├── memory_manager.py
├── purge_vram.py
└── sa.py
The code was still trying to import from the root directory:
# ❌ INCORRECT (Before Fix)
from .memory_manager import MemoryManager, SafeMemoryManager, any
from .purge_vram import DisTorchPurgeVRAMV2
from .sa import PatchSageAttentionDM2. Missing __init__.py in nodes/ Directory
Even after fixing the import paths, the nodes/ directory was missing an __init__.py file, which is required for Python to recognize it as a package. Without this file, relative imports (like from .nodes.memory_manager import ...) fail silently.
Solution
Fix 1: Corrected Import Paths
Updated all import statements to point to the nodes/ subdirectory:
# ✅ CORRECT (After Fix)
from .nodes.memory_manager import MemoryManager, SafeMemoryManager, any
from .nodes.purge_vram import DisTorchPurgeVRAMV2
from .nodes.sa import PatchSageAttentionDMFix 2: Added nodes/__init__.py
Created nodes/__init__.py to make the nodes/ directory a proper Python package:
# nodes/__init__.py
# Nodes package for ComfyUI-DistorchMemoryManager
# This file makes the nodes directory a Python packageFix 3: Added Debug Logging
Added comprehensive debug logging to help diagnose import issues:
- Import Success Logging: Logs when imports succeed
- Import Failure Logging: Logs detailed error messages when imports fail
- Node Registration Logging: Logs which nodes are successfully registered
- Registration Summary: Logs the complete list of registered nodes
Example debug output:
[ComfyUI-DistorchMemoryManager] Successfully imported MemoryManager and SafeMemoryManager from .nodes.memory_manager
[ComfyUI-DistorchMemoryManager] Successfully imported DisTorchPurgeVRAMV2 from .nodes.purge_vram
[ComfyUI-DistorchMemoryManager] Successfully imported PatchSageAttentionDM from .nodes.sa
[ComfyUI-DistorchMemoryManager] Registered MemoryManager node
[ComfyUI-DistorchMemoryManager] Registered SafeMemoryManager node
[ComfyUI-DistorchMemoryManager] Registered DisTorchPurgeVRAMV2 node
[ComfyUI-DistorchMemoryManager] Registered PatchSageAttentionDM node
[ComfyUI-DistorchMemoryManager] Total registered nodes: ['ModelPatchMemoryCleaner', 'MemoryManager', 'SafeMemoryManager', 'DisTorchPurgeVRAMV2', 'PatchSageAttentionDM']
Technical Details
Why __init__.py is Required
In Python, a directory must contain an __init__.py file to be recognized as a package. Without it:
- Relative imports (
.nodes.memory_manager) fail - The directory is treated as a namespace package (Python 3.3+) or not recognized at all (Python < 3.3)
- Import errors occur silently in some cases
Import Mechanism
The code uses a two-level try-except pattern for robustness:
try:
from .nodes.memory_manager import ... # Relative import (preferred)
except ImportError:
try:
from nodes.memory_manager import ... # Absolute import (fallback)
except ImportError:
# Set to None if both failWhy two levels?
- First attempt (relative): Works when the package is properly installed and imported as a module
- Second attempt (absolute): Works in edge cases where the relative import fails but the absolute path is in
sys.path - Fallback: Prevents
NameErrorif both imports fail, allowing the code to continue executing
Changes Made
Files Modified
-
__init__.py- Updated import paths from
.memory_managerto.nodes.memory_manager - Updated import paths from
.purge_vramto.nodes.purge_vram - Updated import paths from
.sato.nodes.sa - Added comprehensive debug logging for imports and node registration
- Updated import paths from
-
nodes/__init__.py(NEW FILE)- Created to make
nodes/directory a proper Python package - Required for relative imports to work correctly
- Created to make
Verification
How to Verify the Fix
-
Check ComfyUI Console Output
- Start ComfyUI and check the console for debug messages
- Should see "Successfully imported" messages for all four nodes
- Should see "Registered" messages for all four nodes
- Should see the complete list of registered nodes
-
Check ComfyUI Node Palette
- Open ComfyUI
- Search for "Memory" category
- Should see all 4 nodes listed:
- Memory Manager
- Safe Memory Manager
- LayerUtility: Purge VRAM V2
- Patch Sage Attention DM
- Model Patch Memory Cleaner should also be visible
-
Test Node Functionality
- Add each node to a workflow
- Verify that all nodes function correctly
- Check that no import errors appear in the console
Expected Behavior After Fix
- ✅ All 4 nodes appear in ComfyUI's node palette under the "Memory" category
- ✅ No import errors in ComfyUI console
- ✅ All nodes function correctly when used in workflows
- ✅ Debug logging provides clear information about import and registration status
Impact
Before Fix
- ❌ Only 1 node visible (Model Patch Memory Cleaner)
- ❌ 3 nodes missing (Memory Manager, Safe Memory Manager, Purge VRAM V2, Patch Sage Attention DM)
- ❌ Users unable to use missing nodes
- ❌ Silent failure (no error messages)
- ❌ Difficult to diagnose the problem
After Fix
- ✅ All 4 nodes visible
- ✅ All nodes functional
- ✅ Users can access all features
- ✅ Proper error handling maintained
- ✅ Debug logging helps diagnose future issues
Related Information
- Issue: GitHub Issue #3
- Fixed Files:
__init__.py(import paths and debug logging)nodes/__init__.py(new file)
- Documentation: See
ISSUE_3_FIX_DOCUMENTATION.mdfor complete technical details
Summary
This release fixes a critical bug that prevented three of four custom nodes from being visible in ComfyUI. The fix involved:
- Correcting import paths to reflect the new
nodes/directory structure - Adding
nodes/__init__.pyto make the directory a proper Python package - Adding debug logging to help diagnose future import issues
Key Takeaway: When refactoring project structure, always:
- Update import paths immediately
- Ensure all subdirectories have
__init__.pyfiles if they contain Python modules - Add debug logging to help diagnose issues
- Test thoroughly after structural changes
v2.3.0 - Flash-Attention Auto-Load and Version Detection Features
Overview
Added independent Flash-Attention auto-load and version detection features. These features are completely independent from ComfyUI's model_management module, eliminating the need to manually modify model_management.py after ComfyUI updates.
Key Features
Independent Version Detection
- Flash-Attention version detection: Completely independent version detection that doesn't rely on model_management module
- FA-2/FA-3 type detection: Automatically detects Flash-Attention 2 or 3 based on version number
- SageAttention version detection: Independent SageAttention version detection with CUDA/PyTorch information
- Version information logging: Version information is logged on every generation
Flash-Attention Auto-Load
- No CLI options required: Flash-Attention is automatically loaded when SageAttention is set to
disabledwithout requiring--use-flash-attentionCLI option - Direct loading: Uses
optimized_attention_overrideto directly load Flash-Attention - Package import detection: Detects Flash-Attention availability based on package import capability, not CLI options
Dynamic Patching and Logging
- ON_PRE_RUN callback: Loads Flash-Attention when disabled, applies SageAttention when enabled, and outputs version logs
- ON_CLEANUP callback: Always outputs Flash-Attention logs even when SageAttention was active (ComfyUI resets to optimal kernel on cleanup)
- Per-generation logging: Version information is logged on every generation
Technical Details
Core Mechanism
The implementation uses ON_PRE_RUN and ON_CLEANUP callbacks for dynamic patching and logging:
-
Generation start (ON_PRE_RUN callback):
- If SageAttention is enabled: Get SageAttention function, set
optimized_attention_overrideto apply patch, output SageAttention version log - If SageAttention is disabled: If Flash-Attention package is available, set
comfy_attention.attention_flashtooptimized_attention_overrideto directly load Flash-Attention, output Flash-Attention version log
- If SageAttention is enabled: Get SageAttention function, set
-
Generation end (ON_CLEANUP callback):
- Delete
optimized_attention_overrideto execute ComfyUI's kernel reset - ComfyUI automatically selects optimal kernel (Flash-Attention) as initial state
- Output Flash-Attention version log at this point (even when SageAttention was active)
- Delete
Flash-Attention Version Detection Function
def get_flash_attention_info():
"""
Get Flash-Attention version and type information.
Returns: (is_available, version, type)
"""Process flow:
- Check package import capability using
import flash_attn - Attempt to get version string from
flash_attn.__version__ - Fallback to
importlib.metadata.version("flash-attn")if version doesn't exist - Split version string and get major version number
- Determine FA-2 or FA-3:
major_version >= 3→ "FA-3", otherwise → "FA-2" - Return tuple:
(is_available, version, type)
Important point: Determines based on package import capability only, regardless of args.use_flash_attention value. This allows accurate detection of actual Flash-Attention availability.
SageAttention Version Detection Function
def get_sage_attention_info():
"""
Get SageAttention version information.
Returns: (version, cuda_version, torch_version)
"""Process flow:
- Import sageattention package
- Attempt to get version from
sageattention.__version__ - Fallback to
importlib.metadata.version("sageattention")if version doesn't exist - Get CUDA version from
torch.version.cuda - Get PyTorch version from
torch.version.__version__ - Return tuple:
(version, cuda_version, torch_version)
Flash-Attention Auto-Load Mechanism
When SageAttention is set to disabled, Flash-Attention is loaded using optimized_attention_override:
def attention_override_flash(func, *args, **kwargs):
return comfy_attention.attention_flash(*args, **kwargs)
model.model_options["transformer_options"]["optimized_attention_override"] = attention_override_flashThis bypasses ComfyUI's default attention selection logic and directly uses the attention_flash function.
Important Points
1. Complete Independence
Completely independent from model_management module. This allows version information to always be retrieved regardless of ComfyUI updates.
2. No CLI Options Required
Flash-Attention can be loaded using only the node's disabled setting, without requiring --use-flash-attention CLI option. Determines based on package import capability only.
3. Per-Generation Logging
- ON_PRE_RUN (before generation): Outputs SageAttention log if enabled, Flash-Attention log if disabled
- ON_CLEANUP (after generation): Always checks and outputs Flash-Attention state (even when SageAttention is enabled, ComfyUI's kernel reset causes Flash-Attention to be selected as initial state)
4. Error Handling
All critical operations are wrapped in try-except blocks to ensure safe operation even when errors occur.
5. Kernel Reset Understanding
ComfyUI performs kernel reset on every generation, returning to initial state (optimal kernel). This initial state automatically selects Flash-Attention when available. Therefore, even when SageAttention is enabled, Flash-Attention logs are output during cleanup after generation ends.
Benefits
- No manual modifications needed: Eliminates need to manually modify model_management.py after ComfyUI updates
- Always available version info: Version information can always be retrieved regardless of ComfyUI updates
- Convenient Flash-Attention loading: Flash-Attention can be loaded without CLI options
- Better visibility: Version information is logged on every generation, providing better visibility into attention mechanism state
Files Modified
nodes/sa.py: Added independent version detection functions and Flash-Attention auto-load functionality
Summary
This implementation adds completely independent version detection and Flash-Attention auto-load features to the ComfyUI-DistorchMemoryManager node. By completely eliminating dependencies on model_management, version information for Flash-Attention and SageAttention can always be retrieved regardless of ComfyUI updates. Additionally, Flash-Attention (FA-2/FA-3) is automatically loaded when disabled, operating without CLI options. This eliminates the hassle of manually modifying model_management.py after each ComfyUI update.
v2.2.0 - SageAttention Patch Feature
Overview
Added Patch Sage Attention DM node for patching ComfyUI's attention mechanism to use SageAttention. This feature allows replacing ComfyUI's standard attention mechanism with SageAttention, providing improved memory efficiency and performance.
Key Features
Patch Sage Attention DM Node
- Dynamic patching: Uses ComfyUI's callback system (ON_PRE_RUN and ON_CLEANUP) for dynamic attention patching
- Multiple SageAttention modes: Supports auto, CUDA, Triton, and SageAttention 3 implementations
- Version detection: Automatically detects and logs SageAttention version with CUDA/PyTorch information
- Flash-Attention state logging: Logs Flash-Attention state when SageAttention is disabled
- ComfyUI compatibility: Compatible with ComfyUI's attention function format using wrap_attn decorator
Supported SageAttention Modes
disabled: Disable SageAttention and restore original attention mechanismauto: Automatic SageAttention implementation selectionsageattn_qk_int8_pv_fp16_cuda: CUDA implementation (QK int8, PV FP16)sageattn_qk_int8_pv_fp16_triton: Triton implementation (QK int8, PV FP16)sageattn_qk_int8_pv_fp8_cuda: CUDA implementation (QK int8, PV FP8)sageattn_qk_int8_pv_fp8_cuda++: CUDA implementation (QK int8, PV FP8, optimized)sageattn3: SageAttention 3 implementation (Blackwell support)sageattn3_per_block_mean: SageAttention 3 implementation (per-block mean version)
Version Detection and Logging
- SageAttention version: Detects version using version attribute or importlib.metadata
- CUDA/PyTorch versions: Includes CUDA and PyTorch version information in logs
- Flash-Attention state: Checks and logs Flash-Attention state from model_management module
- Dynamic logging: Logs are output on every model execution via callbacks
Technical Details
Implementation Architecture
The implementation uses ComfyUI's callback system to dynamically patch attention:
- ON_PRE_RUN callback: Patches attention before each model execution
- ON_CLEANUP callback: Cleans up and logs Flash-Attention state after each execution
Attention Function Wrapping
The implementation uses wrap_attn decorator to wrap SageAttention functions in ComfyUI's attention format:
- Converts tensor shapes from ComfyUI format (q, k, v, heads) to SageAttention format
- Handles FP32 to FP16 conversion (SageAttention primarily operates on FP16)
- Adjusts mask dimensions (adds batch and heads dimensions)
- Restores output to original data type and shape
torch.compile Control
- allow_compile option: Optional boolean parameter to enable torch.compile
- Default disabled: torch.compile is disabled by default to avoid compilation overhead
- Requirement: torch.compile requires sageattn 2.2.0 or higher
Error Handling
All critical operations are wrapped in try-except blocks:
- Version detection failures are handled gracefully
- Flash-Attention state checks are safe
- Fallback messages are provided when information cannot be retrieved
Usage
- Add node in ComfyUI: Add "Patch Sage Attention DM" node from "Memory" category
- Connect model: Connect model from CheckpointLoader or similar
- Select SageAttention mode: Choose desired mode from dropdown
- Configure options: Enable
allow_compileif desired (requires sageattn 2.2.0+) - Execute: Model execution will output logs to console
Log Output Examples
When SageAttention is Enabled
Patching comfy attention to use SageAttention 2.2.0+cu121torch2.3.0
When SageAttention is Disabled (Flash-Attention Enabled)
Restoring initial comfy attention
[ComfyUI] Using FA-3 (Flash-Attention 3.0.0) direct
When SageAttention is Disabled (Flash-Attention Disabled)
Restoring initial comfy attention
Important Notes
- Patching occurs on every execution: Due to callback usage, SageAttention is applied before each execution and cleaned up after
- Use 'disabled' to restore: To disable SageAttention, run the node again with
sage_attentionset todisabled - Logs output on every execution: Console output increases as logs are output on every model execution
- allow_compile requirement: torch.compile requires sageattn 2.2.0 or higher
Files Modified
nodes/sa.py: Added PatchSageAttentionDM class with SageAttention patching functionality
Compatibility
This implementation replicates the functionality of comfyui-kjnodes' PatchSageAttentionKJ node:
- Supports the same SageAttention modes
- Uses the same log format
- Uses the same callback mechanism
v2.0.0 - Qwen3-VL and Nunchaku Model Purging Support
ComfyUI-VRAM-Manager Modification Notes
Purpose of Modifications
Added purge functionality for Qwen3-VL models loaded by qwen3-vl-comfy-ui and Nunchaku models (FLUX/Z-Image/Qwen-Image) loaded by ComfyUI-nunchaku to the purgevram2 node in ComfyUI-VRAM-Manager (formerly ComfyUI-DistorchMemoryManager). These models are not managed by ComfyUI's standard model_management, so dedicated purge processing was required. This was implemented to prevent OOM errors during upscale processing.
Modified Files
ComfyUI/custom_nodes/ComfyUI-DistorchMemoryManager/__init__.py
pyproject.toml
Overview of Modifications
- Added Qwen3-VL model purge functionality
- Added Nunchaku model purge functionality (with CPU offload support)
- Added detailed debug logging
- Enhanced processing to ensure reliable unloading from memory
- Fixed any() function name collision with AnyType
- Changed display name to ComfyUI-VRAM-Manager
1. Parameter Addition to INPUT_TYPES
Added the following two boolean parameters to the INPUT_TYPES of the DisTorchPurgeVRAMV2 class:
"purge_qwen3vl_models": ("BOOLEAN", {"default": False, "tooltip": "Clear Qwen3-VL models from GPU memory"}),
"purge_nunchaku_models": ("BOOLEAN", {"default": False, "tooltip": "Clear Nunchaku models (FLUX/Z-Image/Qwen-Image) from GPU memory"}),This allows users to individually purge Qwen3-VL models and Nunchaku models using the purgevram2 node.
2. Qwen3-VL Model Purge Functionality
2.1 Model Type Import
Dynamically imports Qwen3VLForConditionalGeneration from the transformers library. If the import fails, the purge process is skipped.
from transformers import Qwen3VLForConditionalGeneration
qwen3vl_model_type = Qwen3VLForConditionalGeneration2.2 Method 1: sys.modules Search
Iterates through all modules in sys.modules and searches for Qwen3-VL models stored as module attributes. When a model is found, the following processing is executed:
2.2.1 hf_device_map Processing
Qwen3-VL models may be loaded with device_map="auto", which can result in the model being distributed across multiple devices. The hf_device_map dictionary stores parameter names and their device information.
if hasattr(attr, 'hf_device_map'):
hf_device_map = attr.hf_device_map
for param_name, device in hf_device_map.items():
device_str = str(device) if device is not None else ''
if device_str.startswith('cuda') or (isinstance(device, int) and device >= 0):
# Get module path and move to CPU
submodule = attr
if param_name:
for part in param_name.split('.'):
submodule = getattr(submodule, part)
if hasattr(submodule, 'to'):
submodule.to('cpu')The device can be either a string (e.g., 'cuda:0') or an integer (device ID), so both formats are supported.
2.2.2 Model CPU Transfer
Moves the entire model to CPU. If .to('cpu') fails, a fallback process is implemented to move parameters individually to CPU.
if hasattr(attr, 'to'):
attr.to('cpu')
elif hasattr(attr, 'cpu'):
attr.cpu()2.2.3 Internal State Clearing
To ensure reliable unloading from memory, parameters and buffers are explicitly moved to CPU, and the _modules dictionary is cleared.
if hasattr(attr, 'named_parameters'):
for name, param in list(attr.named_parameters()):
if param is not None and hasattr(param, 'data'):
if param.data is not None:
param.data = param.data.detach().cpu()
if hasattr(attr, 'named_buffers'):
for name, buffer in list(attr.named_buffers()):
if buffer is not None and hasattr(buffer, 'data'):
if buffer.data is not None:
buffer.data = buffer.data.detach().cpu()
if hasattr(attr, '_modules'):
attr._modules.clear()Using detach().cpu() detaches from the computation graph before moving to CPU, ensuring references are reliably released.
2.2.4 Reference Deletion
Deletes the model reference from module attributes and also deletes the object itself.
if hasattr(module, attr_name):
delattr(module, attr_name)
del attr2.3 Method 2: gc.get_objects() Search
Searches for models not found in sys.modules from all objects tracked by the garbage collector. The same processing as Method 1 is executed for found models.
for obj in gc.get_objects():
if isinstance(obj, qwen3vl_model_type):
# hf_device_map processing, CPU transfer, internal state clearing, deletion2.4 Garbage Collection and CUDA Cache Clearing
After model purging, garbage collection is executed twice, and caches for all CUDA devices are cleared.
gc.collect()
gc.collect() # Execute twice to ensure cleanup
for device_idx in range(torch.cuda.device_count()):
with torch.cuda.device(device_idx):
torch.cuda.empty_cache()
torch.cuda.ipc_collect()
torch.cuda.synchronize()3. Nunchaku Model Purge Functionality
3.1 Model Type Import
Nunchaku models have multiple types, so each is imported individually.
- NunchakuFluxTransformer2dModel (for FLUX)
- NunchakuZImageTransformer2DModel (for Z-Image)
- NunchakuT5EncoderModel (for text encoder, import failure is acceptable)
- NunchakuQwenImageTransformer2DModel (for Qwen-Image, searched from multiple paths)
from nunchaku import NunchakuFluxTransformer2dModel
from nunchaku.models.transformers.transformer_zimage import NunchakuZImageTransformer2DModel
from comfyui_nunchaku.models.qwenimage import NunchakuQwenImageTransformer2DModel3.2 CPU Offload Disabling
Nunchaku models may use CPU offload functionality. If offload is enabled, it is disabled using set_offload(False), and offload_manager is set to None to ensure offloaded memory is reliably released.
if hasattr(attr, 'set_offload'):
if hasattr(attr, 'offload') and attr.offload:
attr.set_offload(False)
if hasattr(attr, 'offload_manager') and attr.offload_manager is not None:
attr.offload_manager = None3.3 Method 1: sys.modules Search
Iterates through all modules in sys.modules and searches for Nunchaku models. Models may be found in the following structures:
3.3.1 Direct Reference
When the model is stored directly as a module attribute.
if isinstance(attr, model_type):
# Disable CPU offload, move to CPU, clear internal state, delete3.3.2 Dictionary (transformer key)
When the model is stored in the transformer key of a dictionary.
if isinstance(attr, dict) and 'transformer' in attr:
transformer_obj = attr.get('transformer')
if isinstance(transformer_obj, model_type):
# Similar processing3.3.3 Nested Structure (ModelPatcher structure)
For nested structures like ComfyFluxWrapper, the model is searched by following the path dict.model.diffusion_model.model.
if 'model' in attr:
model_obj = attr.get('model')
if hasattr(model_obj, 'diffusion_model'):
diffusion_model = model_obj.diffusion_model
if hasattr(diffusion_model, 'model'):
transformer_obj = diffusion_model.model
if isinstance(transformer_obj, model_type):
# Similar processing3.4 Method 2: ComfyUI model_management Search
Iterates through ComfyUI's current_loaded_models and searches for Nunchaku models within the ModelPatcher structure.
for loaded_model in current_loaded_models:
if hasattr(loaded_model, "model"):
model = loaded_model.model
if hasattr(model, "diffusion_model"):
diffusion_model = model.diffusion_model
if hasattr(diffusion_model, "model"):
transformer = diffusion_model.model
if isinstance(transformer, model_type):
# Disable CPU offload, move to CPU, clear internal state
loaded_model.currently_used = False
loaded_model.model_unload()Calling model_unload() ensures the model is reliably unloaded from ComfyUI's management.
3.5 Method 3: gc.get_objects() Search
Searches for models not found in Method 1 and Method 2 from all objects tracked by the garbage collector.
for obj in gc.get_objects():
for model_type in nunchaku_model_types:
if isinstance(obj, model_type):
# Disable CPU offload, move to CPU, clear internal state4. Debug Logging Addition
Added detailed debug logging to the Qwen3-VL and Nunchaku purge processes. The following types of logs are output at each processing step:
4.1 Start Message
Explicitly indicates the start of the purge process.
print("Qwen3-VL: Starting purge process...")
print("Nunchaku: Starting purge process...")4.2 Import Results
Records import success/failure for each model type.
print("Qwen3-VL: Successfully imported Qwen3VLForConditionalGeneration")
print("Nunchaku: Successfully imported NunchakuFluxTransformer2dModel")
print(f"Nunchaku: Failed to import NunchakuT5EncoderModel: {e}")4.3 Method Start/Completion Messages
Explicitly indicates the start and completion of each Method, and records the number of objects checked and models detected.
print("Qwen3-VL: Method 1 - Searching sys.modules for models...")
print(f"Qwen3-VL: Method 1 complete - checked {modules_checked} modules")
print(f"Qwen3-VL: Method 2 complete - checked {objects_checked} objects, found {models_found_in_gc} models")4.4 Detailed Information When Models Are Detected
When a model is found, records the type name, ID, and location.
pr...v1.3.1 - Improved SeedVR2 Cache Detection and Messaging
Release Notes v1.3.1
Improved SeedVR2 Cache Detection and Messaging
This release focuses on improving user experience when working with SeedVR2 models, providing clearer feedback about cache state and better debugging information.
Key Improvements
1. Enhanced Debug Information
- Import Method Tracking: Now displays which method was used to access SeedVR2's
GlobalModelCache(Method 1: direct import, Method 2: via seedvr2_videoupscaler, Method 3: via sys.modules) - Detailed Cache State: Shows exact counts of DiT models, VAE models, and runner templates in the cache
- Attribute Verification: Indicates whether cache dictionaries exist but are empty, helping users understand the cache state
2. Clearer User Messages
- Explicit Explanation: Messages now clearly explain that
cache_model=False(default) means models are never cached inGlobalModelCache - Actionable Guidance: Provides instructions on how to enable caching (
cache_model=Truein SeedVR2 nodes) - Technical Accuracy: Correctly describes SeedVR2's automatic cleanup behavior after processing completes
3. Removed Duplicate Messages
- Eliminated redundant console output that was confusing users
- Streamlined message flow for better readability
Technical Details
SeedVR2 Caching Behavior
cache_model=False (Default):
- Models are never added to
GlobalModelCache - After processing completes, models are automatically deleted from memory
- This is expected behavior, not a bug
cache_model=True:
- Models are stored in
GlobalModelCache._dit_modelsand_vae_models - Models remain in memory after processing
- Can be cleared via Purge VRAM V2's
purge_seedvr2_modelsoption
Example Output
SeedVR2: Cache accessed via Method 1 (direct import)
SeedVR2: Checking cache (DiT: 0, VAE: 0, Runners: 0)
SeedVR2: DiT models dictionary exists but is empty
SeedVR2: VAE models dictionary exists but is empty
SeedVR2: Cache is empty - cache_model option is disabled (False by default). Enable cache_model=True in SeedVR2 nodes to cache models in GlobalModelCache.
Impact
- Better User Understanding: Users now know why the cache is empty and what they can do about it
- Easier Debugging: Import method and cache state information help troubleshoot integration issues
- Reduced Confusion: Eliminates misleading messages that suggested models were "already cleared"
Files Modified
__init__.py: EnhancedDisTorchPurgeVRAMV2.purge_vram()method's SeedVR2 handling section
v1.3.0
Release Notes v1.3.0
Overview
Version 1.3.0 introduces SeedVR2 model purging support to the DisTorchPurgeVRAMV2 node, along with critical bug fixes for cleanup_models() errors and CPU device handling. This release significantly improves memory management compatibility with SeedVR2 workflows and resolves stability issues.
New Features
SeedVR2 Model Purging Support
Added comprehensive support for purging SeedVR2's DiT (base) and VAE models from the GlobalModelCache.
Purpose
The SeedVR2 custom node uses an independent model caching system (GlobalModelCache) that stores DiT and VAE models separately from ComfyUI's standard model management. This release adds the ability to clear these cached models through the DisTorchPurgeVRAMV2 node.
Implementation Details
New Option Added:
purge_seedvr2_models: Boolean option (default: False) in DisTorchPurgeVRAMV2 node- When enabled, clears all cached SeedVR2 DiT and VAE models from GlobalModelCache
Path Detection:
The implementation uses multiple methods to locate the SeedVR2 custom node, ensuring compatibility across different user environments:
- Method 1: Import from already loaded modules (most reliable)
- Method 2: Relative path from current file (same
custom_nodesdirectory) - Method 3: Search in
sys.pathforseedvr2_videoupscaler - Method 4: Parse path structure to find
custom_nodesdirectory
This multi-method approach ensures the feature works regardless of where ComfyUI is installed on the user's system.
Model Clearing Process:
- Accesses SeedVR2's
GlobalModelCacheviaget_global_cache() - Iterates through
_dit_modelsdictionary and removes each DiT model usingremove_dit() - Iterates through
_vae_modelsdictionary and removes each VAE model usingremove_vae() - Clears
_runner_templatesdictionary - Properly releases model memory using SeedVR2's
release_model_memory()function
Error Handling:
- Gracefully handles cases where SeedVR2 is not installed
- Does not display errors if SeedVR2 custom node is not found (normal behavior)
- Catches and reports errors for individual model removal failures
Usage
Enable purge_seedvr2_models: True in the DisTorchPurgeVRAMV2 node to clear all cached SeedVR2 models. This is particularly useful when:
- Switching between different SeedVR2 models
- Experiencing memory issues with SeedVR2 workflows
- Need to force reload SeedVR2 models
Bug Fixes
Fixed 'NoneType' object is not callable Error in cleanup_models()
Problem
The cleanup_models() function in ComfyUI's model_management module would fail with 'NoneType' object is not callable errors when attempting to call real_model() on models where real_model was None or not callable.
Solution
Implemented pre-cleanup logic that removes problematic models before calling cleanup_models():
Pre-cleanup Checks:
- Checks if
real_modelisNone- removes model immediately - Checks if
real_modelis not callable - removes model immediately - Attempts to call
real_model()- if it fails or returnsNone, removes model
Implementation Details:
- Pre-cleanup is performed before both
cleanup_models()calls:- Before the initial cleanup (after marking models as unused)
- Before the second cleanup (after model unloading)
- Iterates from end to start to prevent index shifting during removal
- Logs the number of pre-cleaned models for debugging
Code Location:
- Lines 227-262: First pre-cleanup before initial
cleanup_models()call - Lines 312-337: Second pre-cleanup before second
cleanup_models()call
This fix prevents the error from occurring and ensures stable model cleanup operations.
Fixed CPU Device Error in Virtual Memory Reset
Problem
Calling comfy.model_management.free_memory(0, 'cpu') would cause an error: Expected a cuda device, but got: cpu. This affected MemoryManager, SafeMemoryManager, and MemoryCleaner nodes.
Solution
Removed CPU device calls from virtual memory reset operations. The fix:
- Only calls
free_memory(0, 'cuda:0')when CUDA is available - Skips CPU device calls entirely
- Wraps CUDA calls in try-except for additional safety
Modified Nodes:
MemoryManager: Removed CPUfree_memory()callSafeMemoryManager: Removed CPUfree_memory()callMemoryCleaner: Removed CPUfree_memory()call
This ensures virtual memory reset operations complete without errors.
Technical Improvements
Improved Path Detection for SeedVR2
The SeedVR2 path detection system was redesigned to be user-environment independent:
Key Improvements:
- No hardcoded absolute paths
- Uses relative path detection from current file location
- Leverages Python's module import system
- Parses path structure dynamically
Benefits:
- Works regardless of where ComfyUI is installed
- Compatible with different operating systems
- Handles various directory structures
- More robust and maintainable
Summary
Version 1.3.0 enhances the DisTorchPurgeVRAMV2 node with SeedVR2 support and fixes critical stability issues. The release improves memory management for SeedVR2 workflows and resolves errors that could occur during model cleanup operations.
Key Benefits:
- Clear SeedVR2 cached models through DisTorchPurgeVRAMV2 node
- Eliminated 'NoneType' object is not callable errors
- Fixed CPU device errors in virtual memory reset
- Improved compatibility across different user environments
- More stable and reliable memory management
v1.2.0
Release Notes v1.2.0
Overview
Version 1.2.0 introduces the Model Patch Memory Cleaner node, a dedicated memory management solution for ModelPatchLoader model patches. This release also includes significant enhancements to the DisTorchPurgeVRAMV2 node with more aggressive model unloading capabilities and improved error handling.
New Features
Model Patch Memory Cleaner Node
A new dedicated node for clearing model patches loaded via ModelPatchLoader to prevent OOM (Out of Memory) errors during upscaling operations.
Purpose
The ModelPatchMemoryCleaner node is designed to explicitly clear model patches (such as Z-Image ControlNet, QwenImage BlockWise ControlNet, SigLIP MultiFeat Proj) loaded via ModelPatchLoader from VRAM. This prevents OOM errors during upscaling operations by freeing memory occupied by unused model patches.
Problem Background
Model patches loaded via ModelPatchLoader are managed differently from standard models in ComfyUI's memory system. These patches (stored in ModelPatcher's additional_models or attachments) can remain in VRAM even after use, causing OOM errors during subsequent operations like upscaling. Existing memory cleaning nodes cannot properly detect and clear these model patches, necessitating a dedicated solution.
Implementation Details
File Created/Modified:
ComfyUI/custom_nodes/ComfyUI-DistorchMemoryManager/__init__.py
Complete Code:
class ModelPatchMemoryCleaner:
"""
Memory cleaner specifically for ModelPatcher loaded model patches.
Clears model patches loaded via ModelPatchLoader to prevent OOM during upscaling.
"""
@classmethod
def INPUT_TYPES(cls):
return {
"required": {
"anything": (any, {}),
"clear_model_patches": ("BOOLEAN", {"default": True, "tooltip": "Clear model patches loaded via ModelPatchLoader"}),
"clean_gpu": ("BOOLEAN", {"default": True}),
"force_gc": ("BOOLEAN", {"default": True}),
}
}
RETURN_TYPES = (any,)
RETURN_NAMES = ("any",)
FUNCTION = "clear_model_patches"
CATEGORY = "Memory"
def clear_model_patches(self, anything, clear_model_patches, clean_gpu, force_gc):
try:
if clear_model_patches:
import comfy.model_management
import comfy.model_patcher
# Get current loaded models
if hasattr(comfy.model_management, "current_loaded_models"):
current_loaded_models = comfy.model_management.current_loaded_models
# Find and unload model patches
unloaded_count = 0
for i in range(len(current_loaded_models) - 1, -1, -1):
loaded_model = current_loaded_models[i]
if loaded_model is not None and hasattr(loaded_model, "model"):
model = loaded_model.model
# Check if this is a ModelPatcher with additional_models (model patches stored here)
if isinstance(model, comfy.model_patcher.ModelPatcher):
# Check for additional_models (model patches stored here)
if hasattr(model, "additional_models") and model.additional_models:
# Mark as not currently used
loaded_model.currently_used = False
# Unload the model
if hasattr(loaded_model, "model_unload"):
loaded_model.model_unload()
# Remove from current_loaded_models
current_loaded_models.pop(i)
unloaded_count += 1
print(f"Unloaded model patch: {type(model.model).__name__ if hasattr(model, 'model') else 'ModelPatcher'}")
# Also check attachments for model patches
elif hasattr(model, "attachments") and model.attachments:
# Mark as not currently used
loaded_model.currently_used = False
# Unload the model
if hasattr(loaded_model, "model_unload"):
loaded_model.model_unload()
# Remove from current_loaded_models
current_loaded_models.pop(i)
unloaded_count += 1
print(f"Unloaded model patch from attachments: {type(model.model).__name__ if hasattr(model, 'model') else 'ModelPatcher'}")
if unloaded_count > 0:
print(f"Cleared {unloaded_count} model patch(es)")
# Cleanup models GC
if hasattr(comfy.model_management, "cleanup_models_gc"):
comfy.model_management.cleanup_models_gc()
if clean_gpu and torch.cuda.is_available():
torch.cuda.empty_cache()
torch.cuda.synchronize()
print("GPU memory cleared")
if force_gc:
gc.collect()
print("Garbage collection completed")
print("Model patch memory cleanup completed")
except Exception as e:
print(f"Model patch memory cleanup error: {e}")
return (anything,)Code Explanation
1. Class Definition and Documentation
The ModelPatchMemoryCleaner class is a dedicated memory cleaning node for model patches loaded via ModelPatcher. It was created to prevent OOM errors during upscaling operations.
2. INPUT_TYPES Method (Node Input Definition)
The INPUT_TYPES method defines the node's input parameters in ComfyUI:
- anything: AnyType input that accepts any data type and passes it through to the output. This is a passthrough input for data flow in ComfyUI workflows.
- clear_model_patches: Boolean value (default: True). Controls whether to clear model patches loaded via ModelPatchLoader. When True, detects and unloads model patches.
- clean_gpu: Boolean value (default: True). Controls whether to clear GPU memory. When True, executes
torch.cuda.empty_cache()andtorch.cuda.synchronize(). - force_gc: Boolean value (default: True). Controls whether to force garbage collection. When True, executes
gc.collect().
3. RETURN_TYPES and RETURN_NAMES (Node Output Definition)
- RETURN_TYPES: Defines the node's output type. Returns one
anytype. - RETURN_NAMES: Defines the output name. Output is named "any".
- FUNCTION: Specifies the method name to execute. The
clear_model_patchesmethod is called. - CATEGORY: Node category. The node appears in the "Memory" category in ComfyUI's node menu.
4. clear_model_patches Method (Main Processing)
The main processing method that accepts four parameters.
4.1. Model Patch Clearing Process
When clear_model_patches is True, the model patch clearing process is executed:
- Imports
comfy.model_managementandcomfy.model_patcher, which are ComfyUI core modules providing model management and ModelPatcher functionality. current_loaded_modelsis a list of models currently loaded in memory, managed by ComfyUI's model_management module.
4.2. Model Patch Detection and Unloading
The code iterates through the list from back to front to prevent index shifting when removing elements:
- Checks if each
loaded_modelis not None and has amodelattribute. - Verifies if the model is a
ModelPatcherinstance. Model patches loaded via ModelPatchLoader are wrapped in ModelPatcher. - Checks the
additional_modelsattribute. ModelPatcher stores additional models (model patches) in theadditional_modelsdictionary. If this dictionary is not empty, it means model patches are loaded. - For model patches found:
- Sets
currently_usedto False, marking the model as "not in use" in ComfyUI's memory management system. - Calls
model_unload()to unload the model, moving it from VRAM to CPU memory or disk. - Removes the model from the
current_loaded_modelslist usingpop(i). - Increments
unloaded_countto track the number of unloaded model patches. - Prints the type name of the unloaded model patch for debugging.
- Sets
- Also checks the
attachmentsattribute, as ModelPatcher may store model patches inattachmentsas well.
4.3. Cleanup Process
- Prints the number of unloaded model patches.
- Calls
cleanup_models_gc()to perform garbage collection, cleaning up references to deleted models and preventing memory leaks.
4.4. GPU Memory Clearing
When clean_gpu is True and CUDA is available:
torch.cuda.empty_cache(): Clears PyTorch's CUDA cache, freeing unused GPU memory.torch.cuda.synchronize(): Waits for CUDA operations to complete, ensuring memory clearing is fully completed.
4.5. Garbage Collection
When force_gc is True:
gc.collect(): Executes Python's garbage collector, reclaiming unused objects including circular references.
4.6. Error Handling
All processing is wrapped in a try-except block to prevent node crashes even if errors occur:
- If an error occurs, an error message is printed.
- Finally, returns the
anythinginput as-is to the output, allowing data to continue flowing through the workflow.
**5. Node Registrat...