-
-
Notifications
You must be signed in to change notification settings - Fork 546
Description
Hello,
I have just successfully compiled and installed SageAttention3 using the following project:
SageAttention-for-windows
It works correctly in SeedVR2 workflows without any issues.
However, when used with WanVideoSampler, ComfyUI crashes immediately (the process exits) and cannot run properly.
I would like to ask whether this behavior is expected because
WanVideoSampler does not yet support SageAttention3,
or if it is more likely caused by an issue with my local environment.
Thank you very much for your time and for maintaining this project.
OS:Windows (10.0.19045) | GPU:NVIDIA GeForce RTX 5090 (32GB)
Python:3.12.10| PyTorch:2.8.0+cu129 | FlashAttn:v2√ | SageAttn:v3,2√ | Triton:√
CUDA:12.9 丨 CuDNN:91002 | ComfyUI:0.7.0
I ran a simple comparison test between different attention backends under the following conditions:
Model: Wan2.2-Animate 14B FP8 (1 LoRA)
Hardware: RTX 5090
Settings: Swap 30, 720p, 120 frames, 4 steps
Note: sampler time only
Results:
SDPA: ~21GB VRAM, 603 s
FlashAttention2: ~28GB VRAM, 317 s
SageAttention2: ~22GB VRAM, 161 s
SageAttention3: ComfyUI crashes immediately when used with WanVideoSampler, so the test could not be completed
71 [CLIPVisionLoader]: 0.32s - vram 0b
Using tiled image encoding
Requested to load CLIPVisionModelProjection
loaded completely; 28000.99 MB usable, 1208.10 MB loaded, full load: True
Clip embeds shape: torch.Size([2, 257, 1280]), dtype: torch.float32
Combined clip embeds shape: torch.Size([1, 514, 1280])
end_vram - start_vram: 1498845184 - 138158080 = 1360687104
#70 [WanVideoClipVisionEncode]: 0.56s - vram 1360687104b
end_vram - start_vram: 138158080 - 138158080 = 0
#38 [WanVideoVAELoader]: 0.14s - vram 0b
end_vram - start_vram: 138158080 - 138158080 = 0
#324 [easy int]: 0.00s - vram 0b
end_vram - start_vram: 4217011164 - 138158080 = 4078853084
#62 [WanVideoAnimateEmbeds]: 37.39s - vram 4078853084b
end_vram - start_vram: 138158080 - 138158080 = 0
#51 [WanVideoBlockSwap]: 0.00s - vram 0b
end_vram - start_vram: 138158080 - 138158080 = 0
#171 [WanVideoLoraSelectMulti]: 0.00s - vram 0b
CUDA Compute Capability: 12.0
Detected model in_channels: 36
Model cross attention type: i2v, num_heads: 40, num_layers: 40
Model variant detected: 14B
model_type FLOW
end_vram - start_vram: 138159106 - 138158080 = 1026
#22 [WanVideoModelLoader]: 0.93s - vram 1026b
Loading LoRA: Wan\i2v\lightx2v_I2V_14B_480p_cfg_step_distill_rank256_bf16 with strength: 1.0
end_vram - start_vram: 138159106 - 138159106 = 0
#48 [WanVideoSetLoRAs]: 0.05s - vram 0b
end_vram - start_vram: 138159106 - 138159106 = 0
#50 [WanVideoSetBlockSwap]: 0.00s - vram 0b
Loading and assigning model weights to device...
-------------------------
Transformer weights loaded:
Device: cuda:0 | Memory: 1,918,987.68 MB
Device: cpu | Memory: 10,412,686.02 MB
Using 1261 LoRA weight patches for WanVideo model
------- Scheduler info -------
Total timesteps: tensor([999, 937, 833, 624], device='cuda:0')
Using timesteps: tensor([999, 937, 833, 624], device='cuda:0')
Using sigmas: tensor([1.0000, 0.9375, 0.8333, 0.6249, 0.0000])
------------------------------
Rope function: comfy
---------- Sampling start ----------
121 frames at 720x1280 (Input sequence length: 111600) with 4 steps
Generated new RoPE frequencies
[Exited, code 3 (0x00000003)]
------------------------
Fault Traceback:
Fatal Python error: Aborted
Stack (most recent call first):
File "D:\ComfyUI-aki-v2\ComfyUI\custom_nodes\ComfyUI-WanVideoWrapper\wanvideo\modules\attention.py", line 117 in attention
File "D:\ComfyUI-aki-v2\ComfyUI\custom_nodes\ComfyUI-WanVideoWrapper\wanvideo\modules\model.py", line 748 in forward
File "D:\ComfyUI-aki-v2\python\Lib\site-packages\torch\nn\modules\module.py", line 1784 in _call_impl
File "D:\ComfyUI-aki-v2\python\Lib\site-packages\torch\nn\modules\module.py", line 1773 in _wrapped_call_impl
File "D:\ComfyUI-aki-v2\ComfyUI\custom_nodes\ComfyUI-WanVideoWrapper\wanvideo\modules\model.py", line 1313 in forward
File "D:\ComfyUI-aki-v2\python\Lib\site-packages\torch\nn\modules\module.py", line 1784 in _call_impl
File "D:\ComfyUI-aki-v2\python\Lib\site-packages\torch\nn\modules\module.py", line 1773 in _wrapped_call_impl
File "D:\ComfyUI-aki-v2\ComfyUI\custom_nodes\ComfyUI-WanVideoWrapper\wanvideo\modules\model.py", line 3191 in forward
File "D:\ComfyUI-aki-v2\python\Lib\site-packages\torch\nn\modules\module.py", line 1784 in _call_impl
File "D:\ComfyUI-aki-v2\python\Lib\site-packages\torch\nn\modules\module.py", line 1773 in _wrapped_call_impl
File "D:\ComfyUI-aki-v2\ComfyUI\custom_nodes\ComfyUI-WanVideoWrapper\nodes_sampler.py", line 1482 in predict_with_cfg
File "D:\ComfyUI-aki-v2\ComfyUI\custom_nodes\ComfyUI-WanVideoWrapper\nodes_sampler.py", line 2455 in process
File "D:\ComfyUI-aki-v2\ComfyUI\execution.py", line 292 in process_inputs
File "D:\ComfyUI-aki-v2\ComfyUI\execution.py", line 304 in _async_map_node_over_list
File "D:\ComfyUI-aki-v2\ComfyUI\execution.py", line 330 in get_output_data
File "D:\ComfyUI-aki-v2\ComfyUI\execution.py", line 516 in execute
File "D:\ComfyUI-aki-v2\ComfyUI\custom_nodes\ComfyUI-Dev-Utils\nodes\execution_time.py", line 83 in dev_utils_execute
File "D:\ComfyUI-aki-v2\ComfyUI\execution.py", line 719 in execute_async
File "D:\ComfyUI-aki-v2\python\Lib\asyncio\events.py", line 88 in _run
File "D:\ComfyUI-aki-v2\python\Lib\asyncio\base_events.py", line 1999 in _run_once
File "D:\ComfyUI-aki-v2\python\Lib\asyncio\base_events.py", line 645 in run_forever
File "D:\ComfyUI-aki-v2\python\Lib\asyncio\windows_events.py", line 322 in run_forever
File "D:\ComfyUI-aki-v2\python\Lib\asyncio\base_events.py", line 678 in run_until_complete
File "D:\ComfyUI-aki-v2\python\Lib\asyncio\runners.py", line 118 in run
File "D:\ComfyUI-aki-v2\python\Lib\asyncio\runners.py", line 195 in run
File "D:\ComfyUI-aki-v2\ComfyUI\execution.py", line 670 in execute
File "D:\ComfyUI-aki-v2\ComfyUI\main.py", line 230 in prompt_worker
File "D:\ComfyUI-aki-v2\python\Lib\threading.py", line 1012 in run
File "<enhanced_experience vendors.sentry_sdk.integrations.threading>", line 92 in _run_old_run_func
File "<enhanced_experience vendors.sentry_sdk.integrations.threading>", line 99 in run
File "D:\ComfyUI-aki-v2\python\Lib\threading.py", line 1075 in _bootstrap_inner
File "D:\ComfyUI-aki-v2\python\Lib\threading.py", line 1032 in _bootstrap
Extension modules: greenlet._greenlet, multidict._multidict, yarl._quoting_c, propcache._helpers_c, aiohttp._http_writer, aiohttp._http_parser, aiohttp._websocket.mask, aiohttp._websocket.reader_c, frozenlist._frozenlist, markupsafe._speedups, yaml._yaml, numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, PIL._imaging, charset_normalizer.md, torch._C, torch._C._dynamo.autograd_compiler, torch._C._dynamo.eval_frame, torch._C._dynamo.guards, torch._C._dynamo.utils, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, psutil._psutil_windows, PIL._imagingft, av._core, av.logging, av.bytesource, av.buffer, av.audio.format, av.error, av.dictionary, av.container.pyio, av.utils, av.option, av.descriptor, av.format, av.stream, av.container.streams, av.sidedata.motionvectors, av.sidedata.sidedata, av.opaque, av.packet, av.container.input, av.container.output, av.container.core, av.codec.context, av.video.format, av.video.reformatter, av.plane, av.video.plane, av.video.frame, av.video.stream, av.codec.hwaccel, av.codec.codec, av.frame, av.audio.layout, av.audio.plane, av.audio.frame, av.audio.stream, av.filter.link, av.filter.context, av.filter.graph, av.filter.filter, av.filter.loudnorm, av.audio.resampler, av.audio.codeccontext, av.audio.fifo, av.bitstream, av.video.codeccontext, _cffi_backend, regex._regex, sentencepiece._sentencepiece, cython.cimports.libc.math, scipy._lib._ccallback_c, scipy.special._ufuncs_cxx, scipy.special._ellip_harm_2, scipy.special._special_ufuncs, scipy.special._gufuncs, scipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.linalg._fblas, scipy.linalg._flapack, _cyutility, scipy._cyutility, scipy.linalg.cython_lapack, scipy.linalg._cythonized_array_utils, scipy.linalg._solve_toeplitz, scipy.linalg._decomp_lu_cython, scipy.linalg._matfuncs_schur_sqrtm, scipy.linalg._matfuncs_expm, scipy.linalg._linalg_pythran, scipy.linalg.cython_blas, scipy.linalg._decomp_update, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._csparsetools, scipy.integrate._odepack, scipy.integrate._quadpack, scipy.integrate._vode, scipy.integrate._dop, scipy.integrate._lsoda, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.sparse.linalg._propack._spropack, scipy.sparse.linalg._propack._dpropack, scipy.sparse.linalg._propack._cpropack, scipy.sparse.linalg._propack._zpropack, scipy.optimize._group_columns, scipy._lib.messagestream, scipy.optimize._trlib._trlib, scipy.optimize._lbfgsb, _moduleTNC, scipy.optimize._moduleTNC, scipy.optimize._slsqplib, scipy.optimize._minpack, scipy.optimize._lsq.givens_elimination, scipy.optimize._zeros, scipy._lib._uarray._uarray, scipy.linalg._decomp_interpolative, scipy.optimize._bglu_dense, scipy.optimize._lsap, scipy.spatial._ckdtree, scipy.spatial._qhull, scipy.spatial._voronoi, scipy.spatial._hausdorff, scipy.spatial._distance_wrap, scipy.spatial.transform._rotation, scipy.spatial.transform._rigid_transform, scipy.optimize._direct, scipy.interpolate._fitpack, scipy.interpolate._dfitpack, scipy.interpolate._dierckx, scipy.interpolate._ppoly, scipy.interpolate._interpnd, scipy.interpolate._rbfinterp_pythran, scipy.interpolate._rgi_cython, scipy.special.cython_special, scipy.stats._stats, scipy.stats._biasedurn, scipy.stats._stats_pythran, scipy.stats._levy_stable.levyst, scipy.stats._ansari_swilk_statistics, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, scipy.stats._sobol, scipy.stats._qmc_cy, scipy.stats._rcont.rcont, scipy.stats._qmvnt_cy, scipy.ndimage._nd_image, scipy.ndimage._rank_filter_1d, _ni_label, scipy.ndimage._ni_label, av.subtitles.stream, pywt._extensions._dwt, pywt._extensions._cwt, pywt._extensions._pywt, pywt._extensions._swt, kiwisolver._cext, lxml._elementpath, lxml.etree, skimage._shared.geometry, xxhash._xxhash, sklearn.__check_build._check_build, pyarrow.lib, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.strptime, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.lib, pyarrow._compute, pandas._libs.ops, numexpr.interpreter, pandas._libs.hashing, pandas._libs.arrays, pandas._libs.tslib, pandas._libs.sparse, pandas._libs.internals, pandas._libs.indexing, pandas._libs.index, pandas._libs.writers, pandas._libs.join, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.groupby, pandas._libs.json, pandas._libs.parsers, pandas._libs.testing, sklearn.utils._isfinite, sklearn.utils.sparsefuncs_fast, sklearn.utils.murmurhash, sklearn.utils._openmp_helpers, sklearn.utils._random, sklearn.utils._seq_dataset, sklearn.metrics.cluster._expected_mutual_info_fast, sklearn.preprocessing._csr_polynomial_expansion, sklearn.preprocessing._target_encoder_fast, sklearn.metrics._dist_metrics, sklearn.metrics._pairwise_distances_reduction._datasets_pair, sklearn.utils._cython_blas, sklearn.metrics._pairwise_distances_reduction._base, sklearn.metrics._pairwise_distances_reduction._middle_term_computer, sklearn.utils._heap, sklearn.utils._sorting, sklearn.metrics._pairwise_distances_reduction._argkmin, sklearn.metrics._pairwise_distances_reduction._argkmin_classmode, sklearn.utils._vector_sentinel, sklearn.metrics._pairwise_distances_reduction._radius_neighbors, sklearn.metrics._pairwise_distances_reduction._radius_neighbors_classmode, sklearn.metrics._pairwise_fast, sklearn.linear_model._cd_fast, _loss, sklearn._loss._loss, sklearn.utils.arrayfuncs, sklearn.svm._liblinear, sklearn.svm._libsvm, sklearn.svm._libsvm_sparse, sklearn.linear_model._sag_fast, sklearn.utils._weight_vector, sklearn.linear_model._sgd_fast, sklearn.decomposition._online_lda_fast, sklearn.decomposition._cdnmf_fast, skimage.measure._ccomp, google._upb._message, pycocotools._mask, cupy_backends.cuda._softlink, cupy_backends.cuda.api._runtime_enum, cupy_backends.cuda.api.runtime, cupy._util, cupy.cuda.device, fastrlock.rlock, cupy.cuda.memory_hook, cupy_backends.cuda.stream, cupy.cuda.graph, cupy.cuda.stream, cupy_backends.cuda.api._driver_enum, cupy_backends.cuda.api.driver, cupy.cuda.memory, cupy._core.internal, cupy._core._carray, cupy.cuda.texture, cupy.cuda.function, cupy_backends.cuda.libs.nvrtc, cupy.cuda.pinned_memory, cupy.cuda.common, cupy.cuda.cub, cupy_backends.cuda.libs.nvtx, cupy.cuda.thrust, cupy._core._dtype, cupy._core._scalar, cupy._core._accelerator, cupy._core._memory_range, cupy._core._fusion_thread_local, cupy._core._kernel, cupy._core._routines_manipulation, cupy._core._routines_binary, cupy._core._optimize_config, cupy._core._cub_reduction, cupy._core._reduction, cupy._core._routines_math, cupy._core._routines_indexing, cupy._core._routines_linalg, cupy._core._routines_logic, cupy._core._routines_sorting, cupy._core._routines_statistics, cupy._core.dlpack, cupy._core.flags, cupy._core.core, cupy._core._fusion_variable, cupy._core._fusion_trace, cupy._core._fusion_kernel, cupy._core.new_fusion, cupy._core.fusion, cupy._core.raw, cupy.fft._cache, cupy.random._bit_generator, cupy.lib._polynomial, skimage.measure._moments_cy, skimage.measure._find_contours_cy, skimage.measure._marching_cubes_lewiner_cy, scipy.signal._sigtools, scipy.signal._max_len_seq_inner, scipy.signal._upfirdn_apply, scipy.signal._spline, scipy.signal._sosfilt, scipy.signal._peak_finding_utils, PIL._imagingcms, numba.core.typeconv._typeconv, numba._helperlib, numba._dynfunc, numba._dispatcher, numba.core.runtime._nrt_python, _win32sysloader, win32api, numba.np.ufunc._internal, numba.experimental.jitclass._box, sqlalchemy.cyextension.collections, sqlalchemy.cyextension.immutabledict, sqlalchemy.cyextension.processors, sqlalchemy.cyextension.resultproxy, sqlalchemy.cyextension.util, PIL._imagingmath (total: 341)```