Skip to content

OpsiClear/fussim

Repository files navigation

fussim

PyPI Downloads Python License

The fastest SSIM for PyTorch. Pre-built wheels, zero compilation required.

pip install fussim

Requirements: Python 3.10+, PyTorch 2.6+, NVIDIA GPU (Turing or newer)

fussim pytorch-msssim fused-ssim
pip install Yes Yes Needs compiler
CUDA kernels Yes No Yes
Native FP16 Yes No No
Speed (vs msssim) 6.4x 1x ~5x

Used in 3D Gaussian Splatting training. Based on Taming3DGS.


Quick Start

import torch
from fussim import fused_ssim

# Images must be on CUDA, shape (B, C, H, W), range [0, 1]
img1 = torch.rand(1, 3, 256, 256, device="cuda", requires_grad=True)
img2 = torch.rand(1, 3, 256, 256, device="cuda")

# Compute SSIM (returns scalar mean)
ssim_value = fused_ssim(img1, img2)

# Use as loss (only img1 receives gradients)
loss = 1.0 - ssim_value
loss.backward()

FP16 / Mixed Precision:

with torch.autocast(device_type="cuda"):
    ssim_value = fused_ssim(img1, img2)  # Native FP16 CUDA kernel

Drop-in replacement for pytorch-msssim:

# Before
from pytorch_msssim import ssim, SSIM
# After (no other code changes needed)
from fussim import ssim, SSIM

Installation

Recommended: Fat Wheel (auto-detection)

pip install fussim

This installs a single wheel containing all CUDA variants (~10MB). At runtime, fussim automatically detects your PyTorch's CUDA version and loads the correct extension.

Platform Python PyTorch CUDA (auto-detected)
Linux 3.10-3.13 2.6 - 2.10 11.8, 12.4, 12.6, 12.8, 13.0
Windows 3.10-3.13 2.6 - 2.10 11.8, 12.4, 12.6, 12.8, 13.0

PyTorch 2.5 or older? The fat wheel requires PyTorch 2.6+. For older versions, use version-specific wheels or build from source.

No manual version selection needed. Just install and use.

Fat wheel compatibility matrix

The fat wheel contains extensions built with these PyTorch versions:

CUDA Built with PyTorch Compatible with
11.8 2.7.1 2.7+
12.4 2.6.0 2.6+
12.6 2.8.0 (Win) / 2.10.0 (Linux) 2.8+ (or 2.6+ via cu124 fallback)
12.8 2.8.0 (Win) / 2.10.0 (Linux) 2.8+
13.0 2.10.0 (Linux only) 2.10+

PyTorch maintains forward ABI compatibility, so extensions built with older versions work with newer PyTorch.

CUDA version matching: fussim uses the highest compatible variant for your runtime CUDA version:

  • Exact match (e.g., CUDA 12.8 → cu128): used directly.
  • Minor version forward compat (e.g., CUDA 12.9 → cu128): CUDA binaries are forward-compatible within the same major version. If your exact version isn't in the list, fussim picks the nearest lower variant automatically.
  • Cross-major version (e.g., CUDA 14.0 with no cu14x variant): not supported. CUDA does not guarantee ABI compatibility across major versions. You'll need to build from source or wait for a new release.

This means conda-forge users with intermediate CUDA versions (e.g., 12.5, 12.7, 12.9) are fully supported out of the box.


Alternative: Version-Specific Wheels

For exact PyTorch ABI matching or smaller downloads (~2MB each), you can install wheels built for specific PyTorch versions.

Important: You must specify the exact variant. Pip cannot auto-select the PyTorch/CUDA combination.

Step 1: Find your PyTorch and CUDA versions:

import torch
print(f"PyTorch: {torch.__version__}, CUDA: {torch.version.cuda}")
# Example output: PyTorch: 2.10.0, CUDA: 12.8

Step 2: Install the matching wheel:

# Format: fussim==VERSION+ptXXcuYYY
# pt210 = PyTorch 2.10, cu128 = CUDA 12.8

pip install "fussim==0.3.15+pt210cu128" --extra-index-url https://opsiclear.github.io/fussim/whl/
Available combinations
PyTorch Version Tag CUDA 11.8 CUDA 12.1 CUDA 12.4 CUDA 12.6 CUDA 12.8 CUDA 13.0
2.5.1 pt25 cu118 cu121 cu124 - - -
2.6.0 pt26 cu118 - cu124 cu126 - -
2.7.1 pt27 cu118 - - cu126 cu128 -
2.8.0 pt28 - - - cu126 cu128 -
2.9.1 pt29 - - - cu126* cu128* cu130*
2.10.0 pt210 - - - cu126 cu128 cu130

*Linux only. Windows has a known PyTorch bug.

Examples:

pip install "fussim==0.3.15+pt27cu118" --extra-index-url https://opsiclear.github.io/fussim/whl/
pip install "fussim==0.3.15+pt210cu128" --extra-index-url https://opsiclear.github.io/fussim/whl/
pip install "fussim==0.3.15+pt210cu130" --extra-index-url https://opsiclear.github.io/fussim/whl/

Interactive Configurator - generates the exact command for your setup.


Build from Source

Requires CUDA Toolkit and C++ compiler
git clone https://github.com/OpsiClear/fussim.git && cd fussim
pip install torch==... --index-url https://download.pytorch.org/whl/<cuda>
pip install --no-build-isolation .

# For specific GPU architecture:
TORCH_CUDA_ARCH_LIST="8.9" pip install --no-build-isolation .  # RTX 4090

# Build directly from PyPI source instead of a local checkout:
pip install torch==... --index-url https://download.pytorch.org/whl/<cuda>
pip install --no-build-isolation --no-binary fussim fussim

pip build isolation is intentionally rejected for source builds. PyTorch CUDA extensions must compile against the target environment's Torch, and pip's temporary build env can silently pull in a different Torch/CUDA build. If you really did prepare a matching isolated build env, set FUSSIM_ALLOW_BUILD_ISOLATION=1 to override the safeguard.


GPU Support

Architecture GPUs Compute Capability
Turing RTX 20xx, GTX 16xx 7.5
Ampere RTX 30xx, A100 8.0, 8.6
Ada Lovelace RTX 40xx 8.9
Hopper H100, H200 9.0
Blackwell RTX 50xx, B100/B200 10.0, 12.0

API Reference

fused_ssim

fused_ssim(img1, img2, padding="same", train=True, window_size=11) -> Tensor
Parameter Type Default Description
img1 Tensor - First image (B, C, H, W). Receives gradients.
img2 Tensor - Second image (B, C, H, W)
padding str "same" "same" (output = input size) or "valid" (cropped)
train bool True Enable gradient computation
window_size int 11 Gaussian window: 7, 9, or 11

Returns: Scalar mean SSIM value (range: -1 to 1, typically 0 to 1).

Note: Only img1 receives gradients. For training, pass your prediction as img1:

loss = 1 - fused_ssim(prediction, target)  # Correct

ssim (pytorch-msssim compatible)

ssim(X, Y, data_range=255, size_average=True, win_size=11, K=(0.01, 0.03), nonnegative_ssim=False) -> Tensor
Parameter Type Default Description
X, Y Tensor - Images (B, C, H, W). Gradients computed for X.
data_range float 255 Value range (255 for uint8, 1.0 for normalized)
size_average bool True Return scalar mean or per-batch (B,) values
win_size int 11 Gaussian window: 7, 9, or 11
K tuple (0.01, 0.03) SSIM constants (K1, K2)
nonnegative_ssim bool False Clamp negative values to 0

SSIM Module

from fussim import SSIM

module = SSIM(data_range=1.0)
ssim_val = module(pred, target)
loss = 1 - ssim_val
loss.backward()

Utility Functions

from fussim import get_build_info, check_compatibility

# Check installation details
info = get_build_info()
print(info)  # {'version': '0.3.15', 'runtime_torch_version': '2.10.0', ...}

# Verify compatibility
compatible, issues = check_compatibility()

Performance

RTX 4090, batch 5x5x1080x1920, 100 iterations:

Implementation Forward Backward Total Speedup
pytorch-msssim 28.7 ms 28.9 ms 57.5 ms 1.0x
fussim 4.38 ms 4.66 ms 9.04 ms 6.4x

Memory: Fused kernels avoid intermediate allocations, reducing VRAM usage compared to unfused implementations.


Troubleshooting

ImportError: No compatible fussim CUDA extension found

This usually means your PyTorch version is too old for the fat wheel.

Check your versions:

import torch
print(f"PyTorch: {torch.__version__}, CUDA: {torch.version.cuda}")

Solutions:

PyTorch Version Solution
2.6 - 2.10 Should work. Run pip install --upgrade fussim
2.5 or older Use version-specific wheel or upgrade PyTorch
DLL load failed / undefined symbol

This is a PyTorch ABI mismatch. The extension was built with a different PyTorch version.

Fix: Install a version-specific wheel that matches your exact PyTorch version:

# Check your version first
python -c "import torch; print(f'PyTorch: {torch.__version__}, CUDA: {torch.version.cuda}')"

# Install matching wheel (example for PyTorch 2.7.1 + CUDA 11.8)
pip install "fussim==0.3.15+pt27cu118" --extra-index-url https://opsiclear.github.io/fussim/whl/
CUDA extension not loading

Check your installation:

python -c "import fussim; print(fussim.get_build_info())"

Or use the compatibility check:

from fussim import check_compatibility
compatible, issues = check_compatibility()
print(f"Compatible: {compatible}")
print(f"Issues: {issues}")
Non-standard CUDA version (conda-forge, custom builds)

If your CUDA version doesn't exactly match a pre-built variant (e.g., CUDA 12.5, 12.7, 12.9 from conda-forge), fussim automatically picks the nearest compatible lower variant within the same major version:

CUDA 12.9 → uses cu128    CUDA 12.5 → uses cu124    CUDA 13.1 → uses cu130

If no compatible variant exists for your CUDA major version, build from source:

pip install fussim --no-binary fussim
Wrong CUDA version detected

The fat wheel auto-detects from torch.version.cuda. If a fallback warning appears:

import torch
print(torch.version.cuda)  # Check PyTorch's CUDA version

Install a version-specific wheel for exact matching.

Windows build errors with PyTorch 2.9.x

PyTorch 2.9.x has a Windows compilation bug that prevents building extensions from source.

Note: Pre-built wheels work fine on Windows with PyTorch 2.9.x. This only affects building from source.


Limitations

Constraint Reason
PyTorch 2.6+ (fat wheel) ABI compatibility; use version-specific wheels for 2.5
NVIDIA GPU required No CPU fallback
window_size: 7, 9, or 11 only CUDA kernel templates
win_sigma: 1.5 (fixed) Hardcoded in optimized kernel
Custom win not supported Uses built-in Gaussian
No MS-SSIM Single-scale SSIM only

Attribution

Citation

@software{optimized-fused-ssim,
    author = {Janusch Patas},
    title = {Optimized Fused-SSIM},
    year = {2025},
    url = {https://github.com/MrNeRF/optimized-fused-ssim},
}

License

MIT License - see LICENSE for details.

About

Fused ssim based on /MrNeRF/optimized-fused-ssim with complete features

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages