Skip to content

SageAttention3 works in SeedVR2 but crashes ComfyUI in WanVideoSampler #1860

@Dracl

Description

@Dracl

Hello,
I have just successfully compiled and installed SageAttention3 using the following project:
SageAttention-for-windows

It works correctly in SeedVR2 workflows without any issues.
However, when used with WanVideoSampler, ComfyUI crashes immediately (the process exits) and cannot run properly.

I would like to ask whether this behavior is expected because
WanVideoSampler does not yet support SageAttention3,
or if it is more likely caused by an issue with my local environment.

Thank you very much for your time and for maintaining this project.

OS:Windows (10.0.19045) | GPU:NVIDIA GeForce RTX 5090 (32GB)
Python:3.12.10| PyTorch:2.8.0+cu129 | FlashAttn:v2√ | SageAttn:v3,2√ | Triton:√
CUDA:12.9 丨 CuDNN:91002 | ComfyUI:0.7.0


I ran a simple comparison test between different attention backends under the following conditions:

Model: Wan2.2-Animate 14B FP8 (1 LoRA)

Hardware: RTX 5090

Settings: Swap 30, 720p, 120 frames, 4 steps

Note: sampler time only

Results:

SDPA: ~21GB VRAM, 603 s

FlashAttention2: ~28GB VRAM, 317 s

SageAttention2: ~22GB VRAM, 161 s

SageAttention3: ComfyUI crashes immediately when used with WanVideoSampler, so the test could not be completed

71 [CLIPVisionLoader]: 0.32s - vram 0b
Using tiled image encoding
Requested to load CLIPVisionModelProjection
loaded completely; 28000.99 MB usable, 1208.10 MB loaded, full load: True
Clip embeds shape: torch.Size([2, 257, 1280]), dtype: torch.float32
Combined clip embeds shape: torch.Size([1, 514, 1280])
end_vram - start_vram: 1498845184 - 138158080 = 1360687104

#70 [WanVideoClipVisionEncode]: 0.56s - vram 1360687104b
end_vram - start_vram: 138158080 - 138158080 = 0
#38 [WanVideoVAELoader]: 0.14s - vram 0b

end_vram - start_vram: 138158080 - 138158080 = 0
#324 [easy int]: 0.00s - vram 0b

end_vram - start_vram: 4217011164 - 138158080 = 4078853084
#62 [WanVideoAnimateEmbeds]: 37.39s - vram 4078853084b

end_vram - start_vram: 138158080 - 138158080 = 0
#51 [WanVideoBlockSwap]: 0.00s - vram 0b
end_vram - start_vram: 138158080 - 138158080 = 0

#171 [WanVideoLoraSelectMulti]: 0.00s - vram 0b
CUDA Compute Capability: 12.0
Detected model in_channels: 36
Model cross attention type: i2v, num_heads: 40, num_layers: 40
Model variant detected: 14B
model_type FLOW
end_vram - start_vram: 138159106 - 138158080 = 1026

#22 [WanVideoModelLoader]: 0.93s - vram 1026b
Loading LoRA: Wan\i2v\lightx2v_I2V_14B_480p_cfg_step_distill_rank256_bf16 with strength: 1.0
end_vram - start_vram: 138159106 - 138159106 = 0
#48 [WanVideoSetLoRAs]: 0.05s - vram 0b
end_vram - start_vram: 138159106 - 138159106 = 0

#50 [WanVideoSetBlockSwap]: 0.00s - vram 0b
Loading and assigning model weights to device...

-------------------------
Transformer weights loaded:
Device: cuda:0   | Memory: 1,918,987.68 MB
Device: cpu      | Memory: 10,412,686.02 MB
Using 1261 LoRA weight patches for WanVideo model
------- Scheduler info -------
Total timesteps: tensor([999, 937, 833, 624], device='cuda:0')
Using timesteps: tensor([999, 937, 833, 624], device='cuda:0')
Using sigmas: tensor([1.0000, 0.9375, 0.8333, 0.6249, 0.0000])
------------------------------
Rope function: comfy
---------- Sampling start ----------
121 frames at 720x1280 (Input sequence length: 111600) with 4 steps
Generated new RoPE frequencies
[Exited, code 3 (0x00000003)]
------------------------
Fault Traceback: 
Fatal Python error: Aborted

Stack (most recent call first):
  File "D:\ComfyUI-aki-v2\ComfyUI\custom_nodes\ComfyUI-WanVideoWrapper\wanvideo\modules\attention.py", line 117 in attention
  File "D:\ComfyUI-aki-v2\ComfyUI\custom_nodes\ComfyUI-WanVideoWrapper\wanvideo\modules\model.py", line 748 in forward
  File "D:\ComfyUI-aki-v2\python\Lib\site-packages\torch\nn\modules\module.py", line 1784 in _call_impl
  File "D:\ComfyUI-aki-v2\python\Lib\site-packages\torch\nn\modules\module.py", line 1773 in _wrapped_call_impl
  File "D:\ComfyUI-aki-v2\ComfyUI\custom_nodes\ComfyUI-WanVideoWrapper\wanvideo\modules\model.py", line 1313 in forward
  File "D:\ComfyUI-aki-v2\python\Lib\site-packages\torch\nn\modules\module.py", line 1784 in _call_impl
  File "D:\ComfyUI-aki-v2\python\Lib\site-packages\torch\nn\modules\module.py", line 1773 in _wrapped_call_impl
  File "D:\ComfyUI-aki-v2\ComfyUI\custom_nodes\ComfyUI-WanVideoWrapper\wanvideo\modules\model.py", line 3191 in forward
  File "D:\ComfyUI-aki-v2\python\Lib\site-packages\torch\nn\modules\module.py", line 1784 in _call_impl
  File "D:\ComfyUI-aki-v2\python\Lib\site-packages\torch\nn\modules\module.py", line 1773 in _wrapped_call_impl
  File "D:\ComfyUI-aki-v2\ComfyUI\custom_nodes\ComfyUI-WanVideoWrapper\nodes_sampler.py", line 1482 in predict_with_cfg
  File "D:\ComfyUI-aki-v2\ComfyUI\custom_nodes\ComfyUI-WanVideoWrapper\nodes_sampler.py", line 2455 in process
  File "D:\ComfyUI-aki-v2\ComfyUI\execution.py", line 292 in process_inputs
  File "D:\ComfyUI-aki-v2\ComfyUI\execution.py", line 304 in _async_map_node_over_list
  File "D:\ComfyUI-aki-v2\ComfyUI\execution.py", line 330 in get_output_data
  File "D:\ComfyUI-aki-v2\ComfyUI\execution.py", line 516 in execute
  File "D:\ComfyUI-aki-v2\ComfyUI\custom_nodes\ComfyUI-Dev-Utils\nodes\execution_time.py", line 83 in dev_utils_execute
  File "D:\ComfyUI-aki-v2\ComfyUI\execution.py", line 719 in execute_async
  File "D:\ComfyUI-aki-v2\python\Lib\asyncio\events.py", line 88 in _run
  File "D:\ComfyUI-aki-v2\python\Lib\asyncio\base_events.py", line 1999 in _run_once
  File "D:\ComfyUI-aki-v2\python\Lib\asyncio\base_events.py", line 645 in run_forever
  File "D:\ComfyUI-aki-v2\python\Lib\asyncio\windows_events.py", line 322 in run_forever
  File "D:\ComfyUI-aki-v2\python\Lib\asyncio\base_events.py", line 678 in run_until_complete
  File "D:\ComfyUI-aki-v2\python\Lib\asyncio\runners.py", line 118 in run
  File "D:\ComfyUI-aki-v2\python\Lib\asyncio\runners.py", line 195 in run
  File "D:\ComfyUI-aki-v2\ComfyUI\execution.py", line 670 in execute
  File "D:\ComfyUI-aki-v2\ComfyUI\main.py", line 230 in prompt_worker
  File "D:\ComfyUI-aki-v2\python\Lib\threading.py", line 1012 in run
  File "<enhanced_experience vendors.sentry_sdk.integrations.threading>", line 92 in _run_old_run_func
  File "<enhanced_experience vendors.sentry_sdk.integrations.threading>", line 99 in run
  File "D:\ComfyUI-aki-v2\python\Lib\threading.py", line 1075 in _bootstrap_inner
  File "D:\ComfyUI-aki-v2\python\Lib\threading.py", line 1032 in _bootstrap

Extension modules: greenlet._greenlet, multidict._multidict, yarl._quoting_c, propcache._helpers_c, aiohttp._http_writer, aiohttp._http_parser, aiohttp._websocket.mask, aiohttp._websocket.reader_c, frozenlist._frozenlist, markupsafe._speedups, yaml._yaml, numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, PIL._imaging, charset_normalizer.md, torch._C, torch._C._dynamo.autograd_compiler, torch._C._dynamo.eval_frame, torch._C._dynamo.guards, torch._C._dynamo.utils, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, psutil._psutil_windows, PIL._imagingft, av._core, av.logging, av.bytesource, av.buffer, av.audio.format, av.error, av.dictionary, av.container.pyio, av.utils, av.option, av.descriptor, av.format, av.stream, av.container.streams, av.sidedata.motionvectors, av.sidedata.sidedata, av.opaque, av.packet, av.container.input, av.container.output, av.container.core, av.codec.context, av.video.format, av.video.reformatter, av.plane, av.video.plane, av.video.frame, av.video.stream, av.codec.hwaccel, av.codec.codec, av.frame, av.audio.layout, av.audio.plane, av.audio.frame, av.audio.stream, av.filter.link, av.filter.context, av.filter.graph, av.filter.filter, av.filter.loudnorm, av.audio.resampler, av.audio.codeccontext, av.audio.fifo, av.bitstream, av.video.codeccontext, _cffi_backend, regex._regex, sentencepiece._sentencepiece, cython.cimports.libc.math, scipy._lib._ccallback_c, scipy.special._ufuncs_cxx, scipy.special._ellip_harm_2, scipy.special._special_ufuncs, scipy.special._gufuncs, scipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.linalg._fblas, scipy.linalg._flapack, _cyutility, scipy._cyutility, scipy.linalg.cython_lapack, scipy.linalg._cythonized_array_utils, scipy.linalg._solve_toeplitz, scipy.linalg._decomp_lu_cython, scipy.linalg._matfuncs_schur_sqrtm, scipy.linalg._matfuncs_expm, scipy.linalg._linalg_pythran, scipy.linalg.cython_blas, scipy.linalg._decomp_update, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._csparsetools, scipy.integrate._odepack, scipy.integrate._quadpack, scipy.integrate._vode, scipy.integrate._dop, scipy.integrate._lsoda, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.sparse.linalg._propack._spropack, scipy.sparse.linalg._propack._dpropack, scipy.sparse.linalg._propack._cpropack, scipy.sparse.linalg._propack._zpropack, scipy.optimize._group_columns, scipy._lib.messagestream, scipy.optimize._trlib._trlib, scipy.optimize._lbfgsb, _moduleTNC, scipy.optimize._moduleTNC, scipy.optimize._slsqplib, scipy.optimize._minpack, scipy.optimize._lsq.givens_elimination, scipy.optimize._zeros, scipy._lib._uarray._uarray, scipy.linalg._decomp_interpolative, scipy.optimize._bglu_dense, scipy.optimize._lsap, scipy.spatial._ckdtree, scipy.spatial._qhull, scipy.spatial._voronoi, scipy.spatial._hausdorff, scipy.spatial._distance_wrap, scipy.spatial.transform._rotation, scipy.spatial.transform._rigid_transform, scipy.optimize._direct, scipy.interpolate._fitpack, scipy.interpolate._dfitpack, scipy.interpolate._dierckx, scipy.interpolate._ppoly, scipy.interpolate._interpnd, scipy.interpolate._rbfinterp_pythran, scipy.interpolate._rgi_cython, scipy.special.cython_special, scipy.stats._stats, scipy.stats._biasedurn, scipy.stats._stats_pythran, scipy.stats._levy_stable.levyst, scipy.stats._ansari_swilk_statistics, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, scipy.stats._sobol, scipy.stats._qmc_cy, scipy.stats._rcont.rcont, scipy.stats._qmvnt_cy, scipy.ndimage._nd_image, scipy.ndimage._rank_filter_1d, _ni_label, scipy.ndimage._ni_label, av.subtitles.stream, pywt._extensions._dwt, pywt._extensions._cwt, pywt._extensions._pywt, pywt._extensions._swt, kiwisolver._cext, lxml._elementpath, lxml.etree, skimage._shared.geometry, xxhash._xxhash, sklearn.__check_build._check_build, pyarrow.lib, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.strptime, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.lib, pyarrow._compute, pandas._libs.ops, numexpr.interpreter, pandas._libs.hashing, pandas._libs.arrays, pandas._libs.tslib, pandas._libs.sparse, pandas._libs.internals, pandas._libs.indexing, pandas._libs.index, pandas._libs.writers, pandas._libs.join, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.groupby, pandas._libs.json, pandas._libs.parsers, pandas._libs.testing, sklearn.utils._isfinite, sklearn.utils.sparsefuncs_fast, sklearn.utils.murmurhash, sklearn.utils._openmp_helpers, sklearn.utils._random, sklearn.utils._seq_dataset, sklearn.metrics.cluster._expected_mutual_info_fast, sklearn.preprocessing._csr_polynomial_expansion, sklearn.preprocessing._target_encoder_fast, sklearn.metrics._dist_metrics, sklearn.metrics._pairwise_distances_reduction._datasets_pair, sklearn.utils._cython_blas, sklearn.metrics._pairwise_distances_reduction._base, sklearn.metrics._pairwise_distances_reduction._middle_term_computer, sklearn.utils._heap, sklearn.utils._sorting, sklearn.metrics._pairwise_distances_reduction._argkmin, sklearn.metrics._pairwise_distances_reduction._argkmin_classmode, sklearn.utils._vector_sentinel, sklearn.metrics._pairwise_distances_reduction._radius_neighbors, sklearn.metrics._pairwise_distances_reduction._radius_neighbors_classmode, sklearn.metrics._pairwise_fast, sklearn.linear_model._cd_fast, _loss, sklearn._loss._loss, sklearn.utils.arrayfuncs, sklearn.svm._liblinear, sklearn.svm._libsvm, sklearn.svm._libsvm_sparse, sklearn.linear_model._sag_fast, sklearn.utils._weight_vector, sklearn.linear_model._sgd_fast, sklearn.decomposition._online_lda_fast, sklearn.decomposition._cdnmf_fast, skimage.measure._ccomp, google._upb._message, pycocotools._mask, cupy_backends.cuda._softlink, cupy_backends.cuda.api._runtime_enum, cupy_backends.cuda.api.runtime, cupy._util, cupy.cuda.device, fastrlock.rlock, cupy.cuda.memory_hook, cupy_backends.cuda.stream, cupy.cuda.graph, cupy.cuda.stream, cupy_backends.cuda.api._driver_enum, cupy_backends.cuda.api.driver, cupy.cuda.memory, cupy._core.internal, cupy._core._carray, cupy.cuda.texture, cupy.cuda.function, cupy_backends.cuda.libs.nvrtc, cupy.cuda.pinned_memory, cupy.cuda.common, cupy.cuda.cub, cupy_backends.cuda.libs.nvtx, cupy.cuda.thrust, cupy._core._dtype, cupy._core._scalar, cupy._core._accelerator, cupy._core._memory_range, cupy._core._fusion_thread_local, cupy._core._kernel, cupy._core._routines_manipulation, cupy._core._routines_binary, cupy._core._optimize_config, cupy._core._cub_reduction, cupy._core._reduction, cupy._core._routines_math, cupy._core._routines_indexing, cupy._core._routines_linalg, cupy._core._routines_logic, cupy._core._routines_sorting, cupy._core._routines_statistics, cupy._core.dlpack, cupy._core.flags, cupy._core.core, cupy._core._fusion_variable, cupy._core._fusion_trace, cupy._core._fusion_kernel, cupy._core.new_fusion, cupy._core.fusion, cupy._core.raw, cupy.fft._cache, cupy.random._bit_generator, cupy.lib._polynomial, skimage.measure._moments_cy, skimage.measure._find_contours_cy, skimage.measure._marching_cubes_lewiner_cy, scipy.signal._sigtools, scipy.signal._max_len_seq_inner, scipy.signal._upfirdn_apply, scipy.signal._spline, scipy.signal._sosfilt, scipy.signal._peak_finding_utils, PIL._imagingcms, numba.core.typeconv._typeconv, numba._helperlib, numba._dynfunc, numba._dispatcher, numba.core.runtime._nrt_python, _win32sysloader, win32api, numba.np.ufunc._internal, numba.experimental.jitclass._box, sqlalchemy.cyextension.collections, sqlalchemy.cyextension.immutabledict, sqlalchemy.cyextension.processors, sqlalchemy.cyextension.resultproxy, sqlalchemy.cyextension.util, PIL._imagingmath (total: 341)```

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions