Skip to content

Comments

Added nvfp4 official support.#438

Open
BuffMcBigHuge wants to merge 6 commits intomainfrom
marco/feat/nvfp4
Open

Added nvfp4 official support.#438
BuffMcBigHuge wants to merge 6 commits intomainfrom
marco/feat/nvfp4

Conversation

@BuffMcBigHuge
Copy link
Collaborator

@BuffMcBigHuge BuffMcBigHuge commented Feb 11, 2026

Summary

Adds official support for NVFP4 quantization, enabling ~4x weight memory reduction on Blackwell GPUs (SM >= 10.0). NVFP4 uses NVIDIA's E2M1 format and hardware-accelerated Tensor Core kernels via comfy-kitchen.

What's new

  • NVFP4 quantization option – Users with Blackwell GPUs (RTX 50xx, B100, etc.) can select nvfp4 (Blackwell) in the quantization dropdown when their hardware supports it.
  • Shared quantization pipeline – FP8 and NVFP4 logic is centralized in quantization_utils.py, replacing duplicated FP8 logic across pipelines.
  • Hardware detection – The server exposes supports_nvfp4 in the hardware info API based on CUDA device capability (SM >= 10.0). The UI only shows the NVFP4 option when supported.
  • VACE compatibility – VACE components now support both FP8 and NVFP4 quantization when enabled.

Technical details

  • NVFP4 uses comfy-kitchen's QuantizedTensor with TensorCoreNVFP4Layout for hardware-accelerated matmul on Blackwell.
  • FP8 (fp8_e4m3fn) continues to use torchao for Ada+ GPUs (SM >= 8.9), with ~2x weight memory reduction.
  • Dependencies: Adds comfy-kitchen[cublas]>=0.1.0 (Linux/Windows) and torchaudio==2.9.1 for future audio support.
  • Fallback: If a user selects NVFP4 and later switches to a non-Blackwell GPU (e.g. from persisted state), the UI resets to fp8_e4m3fn.

Pipelines updated

All pipelines that support quantization now use the shared apply_quantization():

  • krea_realtime_video
  • longlive
  • memflow
  • reward_forcing
  • streamdiffusionv2

Files changed

Area Changes
Backend New quantization_utils.py, enum update, 6 pipelines refactored, VACE mixin, hardware info API
Frontend TypeScript types, supportsNvfp4 wiring, quantization dropdown with conditional NVFP4 option, persisted state reset
Deps comfy-kitchen[cublas], torchaudio in pyproject.toml

Signed-off-by: BuffMcBigHuge <marco@bymar.co>
Signed-off-by: BuffMcBigHuge <marco@bymar.co>
Signed-off-by: BuffMcBigHuge <marco@bymar.co>
Signed-off-by: BuffMcBigHuge <marco@bymar.co>
Signed-off-by: BuffMcBigHuge <marco@bymar.co>
@BuffMcBigHuge BuffMcBigHuge marked this pull request as ready for review February 12, 2026 02:19
@yondonfu
Copy link
Contributor

yondonfu commented Feb 13, 2026

I'll look into this more later, but on the first run with this branch with default LongLive settings:

2026-02-13 16:32:10,997 - scope.server.pipeline_manager - INFO - Initial load params: {'height': 320, 'width': 576, 'quantization': 'nvfp4', 'vace_enabled': False}
2026-02-13 16:32:10,998 - scope.server.pipeline_manager - INFO - VACE disabled by load_params, skipping VACE configuration
2026-02-13 16:32:11,804 - scope.core.pipelines.wan2_1.vace.mixin - INFO - _init_vace: No vace_path provided, VACE disabled
Loaded diffusion LoRA in 2.525s
2026-02-13 16:32:14,329 - scope.core.pipelines.wan2_1.lora.mixin - INFO - _init_loras: Found 0 LoRA configs to load
2026-02-13 16:32:15,419 - scope.core.pipelines.quantization_utils - INFO - Skipped 600 LoRA adapter layers
2026-02-13 16:32:15,419 - scope.core.pipelines.quantization_utils - INFO - Quantizing 301 Linear layers to NVFP4
2026-02-13 16:32:15,555 - scope.core.pipelines.quantization_utils - INFO - GPU memory before NVFP4 quantization: 3.30 GB
2026-02-13 16:32:15,562 - scope.server.pipeline_manager - ERROR - Failed to load pipeline longlive: CUDA kernel launch failed: CUDA driver version is insufficient for CUDA runtime version. If this error persists, consider removing the models directory 'C:\Users\yondo\.daydream-scope\models' and re-downloading models.
2026-02-13 16:32:15,568 - scope.server.pipeline_manager - ERROR - Failed to load pipeline: longlive
2026-02-13 16:32:15,568 - scope.server.pipeline_manager - ERROR - Some pipelines failed to load

Would be helpful to note the CUDA driver version required.

EDIT: I updated to the the latest driver version on my PC (see details below) and it runs now. Will share test results separately.

 NVIDIA-SMI 591.74                 Driver Version: 591.74         CUDA Version: 13.1

from .enums import Quantization as Quantization # noqa: PLC0414
from .enums import VaeType as VaeType # noqa: PLC0414

# Re-export quantization utilities
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why re-export?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants