The fastest SSIM for PyTorch. Pre-built wheels, zero compilation required.
pip install fussimRequirements: Python 3.10+, PyTorch 2.6+, NVIDIA GPU (Turing or newer)
| fussim | pytorch-msssim | fused-ssim | |
|---|---|---|---|
pip install |
Yes | Yes | Needs compiler |
| CUDA kernels | Yes | No | Yes |
| Native FP16 | Yes | No | No |
| Speed (vs msssim) | 6.4x | 1x | ~5x |
Used in 3D Gaussian Splatting training. Based on Taming3DGS.
import torch
from fussim import fused_ssim
# Images must be on CUDA, shape (B, C, H, W), range [0, 1]
img1 = torch.rand(1, 3, 256, 256, device="cuda", requires_grad=True)
img2 = torch.rand(1, 3, 256, 256, device="cuda")
# Compute SSIM (returns scalar mean)
ssim_value = fused_ssim(img1, img2)
# Use as loss (only img1 receives gradients)
loss = 1.0 - ssim_value
loss.backward()FP16 / Mixed Precision:
with torch.autocast(device_type="cuda"):
ssim_value = fused_ssim(img1, img2) # Native FP16 CUDA kernelDrop-in replacement for pytorch-msssim:
# Before
from pytorch_msssim import ssim, SSIM
# After (no other code changes needed)
from fussim import ssim, SSIMpip install fussimThis installs a single wheel containing all CUDA variants (~10MB). At runtime, fussim automatically detects your PyTorch's CUDA version and loads the correct extension.
| Platform | Python | PyTorch | CUDA (auto-detected) |
|---|---|---|---|
| Linux | 3.10-3.13 | 2.6 - 2.10 | 11.8, 12.4, 12.6, 12.8, 13.0 |
| Windows | 3.10-3.13 | 2.6 - 2.10 | 11.8, 12.4, 12.6, 12.8, 13.0 |
PyTorch 2.5 or older? The fat wheel requires PyTorch 2.6+. For older versions, use version-specific wheels or build from source.
No manual version selection needed. Just install and use.
Fat wheel compatibility matrix
The fat wheel contains extensions built with these PyTorch versions:
| CUDA | Built with PyTorch | Compatible with |
|---|---|---|
| 11.8 | 2.7.1 | 2.7+ |
| 12.4 | 2.6.0 | 2.6+ |
| 12.6 | 2.8.0 (Win) / 2.10.0 (Linux) | 2.8+ (or 2.6+ via cu124 fallback) |
| 12.8 | 2.8.0 (Win) / 2.10.0 (Linux) | 2.8+ |
| 13.0 | 2.10.0 (Linux only) | 2.10+ |
PyTorch maintains forward ABI compatibility, so extensions built with older versions work with newer PyTorch.
CUDA version matching: fussim uses the highest compatible variant for your runtime CUDA version:
- Exact match (e.g., CUDA 12.8 →
cu128): used directly. - Minor version forward compat (e.g., CUDA 12.9 →
cu128): CUDA binaries are forward-compatible within the same major version. If your exact version isn't in the list, fussim picks the nearest lower variant automatically. - Cross-major version (e.g., CUDA 14.0 with no
cu14xvariant): not supported. CUDA does not guarantee ABI compatibility across major versions. You'll need to build from source or wait for a new release.
This means conda-forge users with intermediate CUDA versions (e.g., 12.5, 12.7, 12.9) are fully supported out of the box.
For exact PyTorch ABI matching or smaller downloads (~2MB each), you can install wheels built for specific PyTorch versions.
Important: You must specify the exact variant. Pip cannot auto-select the PyTorch/CUDA combination.
Step 1: Find your PyTorch and CUDA versions:
import torch
print(f"PyTorch: {torch.__version__}, CUDA: {torch.version.cuda}")
# Example output: PyTorch: 2.10.0, CUDA: 12.8Step 2: Install the matching wheel:
# Format: fussim==VERSION+ptXXcuYYY
# pt210 = PyTorch 2.10, cu128 = CUDA 12.8
pip install "fussim==0.3.15+pt210cu128" --extra-index-url https://opsiclear.github.io/fussim/whl/Available combinations
| PyTorch | Version Tag | CUDA 11.8 | CUDA 12.1 | CUDA 12.4 | CUDA 12.6 | CUDA 12.8 | CUDA 13.0 |
|---|---|---|---|---|---|---|---|
| 2.5.1 | pt25 |
cu118 |
cu121 |
cu124 |
- | - | - |
| 2.6.0 | pt26 |
cu118 |
- | cu124 |
cu126 |
- | - |
| 2.7.1 | pt27 |
cu118 |
- | - | cu126 |
cu128 |
- |
| 2.8.0 | pt28 |
- | - | - | cu126 |
cu128 |
- |
| 2.9.1 | pt29 |
- | - | - | cu126* |
cu128* |
cu130* |
| 2.10.0 | pt210 |
- | - | - | cu126 |
cu128 |
cu130 |
*Linux only. Windows has a known PyTorch bug.
Examples:
pip install "fussim==0.3.15+pt27cu118" --extra-index-url https://opsiclear.github.io/fussim/whl/
pip install "fussim==0.3.15+pt210cu128" --extra-index-url https://opsiclear.github.io/fussim/whl/
pip install "fussim==0.3.15+pt210cu130" --extra-index-url https://opsiclear.github.io/fussim/whl/Interactive Configurator - generates the exact command for your setup.
Requires CUDA Toolkit and C++ compiler
git clone https://github.com/OpsiClear/fussim.git && cd fussim
pip install torch==... --index-url https://download.pytorch.org/whl/<cuda>
pip install --no-build-isolation .
# For specific GPU architecture:
TORCH_CUDA_ARCH_LIST="8.9" pip install --no-build-isolation . # RTX 4090
# Build directly from PyPI source instead of a local checkout:
pip install torch==... --index-url https://download.pytorch.org/whl/<cuda>
pip install --no-build-isolation --no-binary fussim fussimpip build isolation is intentionally rejected for source builds. PyTorch CUDA
extensions must compile against the target environment's Torch, and pip's
temporary build env can silently pull in a different Torch/CUDA build. If you
really did prepare a matching isolated build env, set
FUSSIM_ALLOW_BUILD_ISOLATION=1 to override the safeguard.
| Architecture | GPUs | Compute Capability |
|---|---|---|
| Turing | RTX 20xx, GTX 16xx | 7.5 |
| Ampere | RTX 30xx, A100 | 8.0, 8.6 |
| Ada Lovelace | RTX 40xx | 8.9 |
| Hopper | H100, H200 | 9.0 |
| Blackwell | RTX 50xx, B100/B200 | 10.0, 12.0 |
fused_ssim(img1, img2, padding="same", train=True, window_size=11) -> Tensor| Parameter | Type | Default | Description |
|---|---|---|---|
img1 |
Tensor | - | First image (B, C, H, W). Receives gradients. |
img2 |
Tensor | - | Second image (B, C, H, W) |
padding |
str | "same" |
"same" (output = input size) or "valid" (cropped) |
train |
bool | True |
Enable gradient computation |
window_size |
int | 11 |
Gaussian window: 7, 9, or 11 |
Returns: Scalar mean SSIM value (range: -1 to 1, typically 0 to 1).
Note: Only
img1receives gradients. For training, pass your prediction asimg1:loss = 1 - fused_ssim(prediction, target) # Correct
ssim(X, Y, data_range=255, size_average=True, win_size=11, K=(0.01, 0.03), nonnegative_ssim=False) -> Tensor| Parameter | Type | Default | Description |
|---|---|---|---|
X, Y |
Tensor | - | Images (B, C, H, W). Gradients computed for X. |
data_range |
float | 255 |
Value range (255 for uint8, 1.0 for normalized) |
size_average |
bool | True |
Return scalar mean or per-batch (B,) values |
win_size |
int | 11 |
Gaussian window: 7, 9, or 11 |
K |
tuple | (0.01, 0.03) |
SSIM constants (K1, K2) |
nonnegative_ssim |
bool | False |
Clamp negative values to 0 |
from fussim import SSIM
module = SSIM(data_range=1.0)
ssim_val = module(pred, target)
loss = 1 - ssim_val
loss.backward()from fussim import get_build_info, check_compatibility
# Check installation details
info = get_build_info()
print(info) # {'version': '0.3.15', 'runtime_torch_version': '2.10.0', ...}
# Verify compatibility
compatible, issues = check_compatibility()RTX 4090, batch 5x5x1080x1920, 100 iterations:
| Implementation | Forward | Backward | Total | Speedup |
|---|---|---|---|---|
| pytorch-msssim | 28.7 ms | 28.9 ms | 57.5 ms | 1.0x |
| fussim | 4.38 ms | 4.66 ms | 9.04 ms | 6.4x |
Memory: Fused kernels avoid intermediate allocations, reducing VRAM usage compared to unfused implementations.
ImportError: No compatible fussim CUDA extension found
This usually means your PyTorch version is too old for the fat wheel.
Check your versions:
import torch
print(f"PyTorch: {torch.__version__}, CUDA: {torch.version.cuda}")Solutions:
| PyTorch Version | Solution |
|---|---|
| 2.6 - 2.10 | Should work. Run pip install --upgrade fussim |
| 2.5 or older | Use version-specific wheel or upgrade PyTorch |
DLL load failed / undefined symbol
This is a PyTorch ABI mismatch. The extension was built with a different PyTorch version.
Fix: Install a version-specific wheel that matches your exact PyTorch version:
# Check your version first
python -c "import torch; print(f'PyTorch: {torch.__version__}, CUDA: {torch.version.cuda}')"
# Install matching wheel (example for PyTorch 2.7.1 + CUDA 11.8)
pip install "fussim==0.3.15+pt27cu118" --extra-index-url https://opsiclear.github.io/fussim/whl/CUDA extension not loading
Check your installation:
python -c "import fussim; print(fussim.get_build_info())"Or use the compatibility check:
from fussim import check_compatibility
compatible, issues = check_compatibility()
print(f"Compatible: {compatible}")
print(f"Issues: {issues}")Non-standard CUDA version (conda-forge, custom builds)
If your CUDA version doesn't exactly match a pre-built variant (e.g., CUDA 12.5, 12.7, 12.9 from conda-forge), fussim automatically picks the nearest compatible lower variant within the same major version:
CUDA 12.9 → uses cu128 CUDA 12.5 → uses cu124 CUDA 13.1 → uses cu130
If no compatible variant exists for your CUDA major version, build from source:
pip install fussim --no-binary fussimWrong CUDA version detected
The fat wheel auto-detects from torch.version.cuda. If a fallback warning appears:
import torch
print(torch.version.cuda) # Check PyTorch's CUDA versionInstall a version-specific wheel for exact matching.
Windows build errors with PyTorch 2.9.x
PyTorch 2.9.x has a Windows compilation bug that prevents building extensions from source.
Note: Pre-built wheels work fine on Windows with PyTorch 2.9.x. This only affects building from source.
| Constraint | Reason |
|---|---|
| PyTorch 2.6+ (fat wheel) | ABI compatibility; use version-specific wheels for 2.5 |
| NVIDIA GPU required | No CPU fallback |
window_size: 7, 9, or 11 only |
CUDA kernel templates |
win_sigma: 1.5 (fixed) |
Hardcoded in optimized kernel |
Custom win not supported |
Uses built-in Gaussian |
| No MS-SSIM | Single-scale SSIM only |
- optimized-fused-ssim by Janusch Patas (Taming3DGS)
- fused-ssim by Rahul Goel
@software{optimized-fused-ssim,
author = {Janusch Patas},
title = {Optimized Fused-SSIM},
year = {2025},
url = {https://github.com/MrNeRF/optimized-fused-ssim},
}MIT License - see LICENSE for details.