Run Qwen3.5-122B and Nemotron-3-Super-120B on a single NVIDIA GB10 with NVFP4 weights + TurboQuant 3.5-bit KV cache. Built on vLLM v0.16 with custom Triton kernels for SM121.
triton moe quantization blackwell kv-cache vllm local-llm llm-inference qwen3 nvfp4 sm121 turboquant nvidia-gb10
-
Updated
Apr 2, 2026 - Python