fussim

The fastest SSIM for PyTorch. Pre-built wheels, zero compilation required.

pip install fussim

Requirements: Python 3.10+, PyTorch 2.6+, NVIDIA GPU (Turing or newer)

	fussim	pytorch-msssim	fused-ssim
`pip install`	Yes	Yes	Needs compiler
CUDA kernels	Yes	No	Yes
Native FP16	Yes	No	No
Speed (vs msssim)	6.4x	1x	~5x

Used in 3D Gaussian Splatting training. Based on Taming3DGS.

Quick Start

import torch
from fussim import fused_ssim

# Images must be on CUDA, shape (B, C, H, W), range [0, 1]
img1 = torch.rand(1, 3, 256, 256, device="cuda", requires_grad=True)
img2 = torch.rand(1, 3, 256, 256, device="cuda")

# Compute SSIM (returns scalar mean)
ssim_value = fused_ssim(img1, img2)

# Use as loss (only img1 receives gradients)
loss = 1.0 - ssim_value
loss.backward()

FP16 / Mixed Precision:

with torch.autocast(device_type="cuda"):
    ssim_value = fused_ssim(img1, img2)  # Native FP16 CUDA kernel

Drop-in replacement for pytorch-msssim:

# Before
from pytorch_msssim import ssim, SSIM
# After (no other code changes needed)
from fussim import ssim, SSIM

Installation

Recommended: Fat Wheel (auto-detection)

pip install fussim

This installs a single wheel containing all CUDA variants (~10MB). At runtime, fussim automatically detects your PyTorch's CUDA version and loads the correct extension.

Platform	Python	PyTorch	CUDA (auto-detected)
Linux	3.10-3.13	2.6 - 2.10	11.8, 12.4, 12.6, 12.8, 13.0
Windows	3.10-3.13	2.6 - 2.10	11.8, 12.4, 12.6, 12.8, 13.0

PyTorch 2.5 or older? The fat wheel requires PyTorch 2.6+. For older versions, use version-specific wheels or build from source.

No manual version selection needed. Just install and use.

Fat wheel compatibility matrix

The fat wheel contains extensions built with these PyTorch versions:

CUDA	Built with PyTorch	Compatible with
11.8	2.7.1	2.7+
12.4	2.6.0	2.6+
12.6	2.8.0 (Win) / 2.10.0 (Linux)	2.8+ (or 2.6+ via cu124 fallback)
12.8	2.8.0 (Win) / 2.10.0 (Linux)	2.8+
13.0	2.10.0 (Linux only)	2.10+

PyTorch maintains forward ABI compatibility, so extensions built with older versions work with newer PyTorch.

CUDA version matching: fussim uses the highest compatible variant for your runtime CUDA version:

Exact match (e.g., CUDA 12.8 → cu128): used directly.
Minor version forward compat (e.g., CUDA 12.9 → cu128): CUDA binaries are forward-compatible within the same major version. If your exact version isn't in the list, fussim picks the nearest lower variant automatically.
Cross-major version (e.g., CUDA 14.0 with no cu14x variant): not supported. CUDA does not guarantee ABI compatibility across major versions. You'll need to build from source or wait for a new release.

This means conda-forge users with intermediate CUDA versions (e.g., 12.5, 12.7, 12.9) are fully supported out of the box.

Alternative: Version-Specific Wheels

For exact PyTorch ABI matching or smaller downloads (~2MB each), you can install wheels built for specific PyTorch versions.

Important: You must specify the exact variant. Pip cannot auto-select the PyTorch/CUDA combination.

Step 1: Find your PyTorch and CUDA versions:

import torch
print(f"PyTorch: {torch.__version__}, CUDA: {torch.version.cuda}")
# Example output: PyTorch: 2.10.0, CUDA: 12.8

Step 2: Install the matching wheel:

# Format: fussim==VERSION+ptXXcuYYY
# pt210 = PyTorch 2.10, cu128 = CUDA 12.8

pip install "fussim==0.3.15+pt210cu128" --extra-index-url https://opsiclear.github.io/fussim/whl/

Available combinations

PyTorch	Version Tag	CUDA 11.8	CUDA 12.1	CUDA 12.4	CUDA 12.6	CUDA 12.8	CUDA 13.0
2.5.1	`pt25`	`cu118`	`cu121`	`cu124`	-	-	-
2.6.0	`pt26`	`cu118`	-	`cu124`	`cu126`	-	-
2.7.1	`pt27`	`cu118`	-	-	`cu126`	`cu128`	-
2.8.0	`pt28`	-	-	-	`cu126`	`cu128`	-
2.9.1	`pt29`	-	-	-	`cu126`*	`cu128`*	`cu130`*
2.10.0	`pt210`	-	-	-	`cu126`	`cu128`	`cu130`

*Linux only. Windows has a known PyTorch bug.

Examples:

pip install "fussim==0.3.15+pt27cu118" --extra-index-url https://opsiclear.github.io/fussim/whl/
pip install "fussim==0.3.15+pt210cu128" --extra-index-url https://opsiclear.github.io/fussim/whl/
pip install "fussim==0.3.15+pt210cu130" --extra-index-url https://opsiclear.github.io/fussim/whl/

Interactive Configurator - generates the exact command for your setup.

Build from Source

Requires CUDA Toolkit and C++ compiler

git clone https://github.com/OpsiClear/fussim.git && cd fussim
pip install torch==... --index-url https://download.pytorch.org/whl/<cuda>
pip install --no-build-isolation .

# For specific GPU architecture:
TORCH_CUDA_ARCH_LIST="8.9" pip install --no-build-isolation .  # RTX 4090

# Build directly from PyPI source instead of a local checkout:
pip install torch==... --index-url https://download.pytorch.org/whl/<cuda>
pip install --no-build-isolation --no-binary fussim fussim

pip build isolation is intentionally rejected for source builds. PyTorch CUDA extensions must compile against the target environment's Torch, and pip's temporary build env can silently pull in a different Torch/CUDA build. If you really did prepare a matching isolated build env, set FUSSIM_ALLOW_BUILD_ISOLATION=1 to override the safeguard.

GPU Support

Architecture	GPUs	Compute Capability
Turing	RTX 20xx, GTX 16xx	7.5
Ampere	RTX 30xx, A100	8.0, 8.6
Ada Lovelace	RTX 40xx	8.9
Hopper	H100, H200	9.0
Blackwell	RTX 50xx, B100/B200	10.0, 12.0

API Reference

`fused_ssim`

fused_ssim(img1, img2, padding="same", train=True, window_size=11) -> Tensor

Parameter	Type	Default	Description
`img1`	Tensor	-	First image `(B, C, H, W)`. Receives gradients.
`img2`	Tensor	-	Second image `(B, C, H, W)`
`padding`	str	`"same"`	`"same"` (output = input size) or `"valid"` (cropped)
`train`	bool	`True`	Enable gradient computation
`window_size`	int	`11`	Gaussian window: `7`, `9`, or `11`

Returns: Scalar mean SSIM value (range: -1 to 1, typically 0 to 1).

Note: Only img1 receives gradients. For training, pass your prediction as img1:
loss = 1 - fused_ssim(prediction, target)  # Correct

`ssim` (pytorch-msssim compatible)

ssim(X, Y, data_range=255, size_average=True, win_size=11, K=(0.01, 0.03), nonnegative_ssim=False) -> Tensor

Parameter	Type	Default	Description
`X`, `Y`	Tensor	-	Images `(B, C, H, W)`. Gradients computed for `X`.
`data_range`	float	`255`	Value range (`255` for uint8, `1.0` for normalized)
`size_average`	bool	`True`	Return scalar mean or per-batch `(B,)` values
`win_size`	int	`11`	Gaussian window: `7`, `9`, or `11`
`K`	tuple	`(0.01, 0.03)`	SSIM constants (K1, K2)
`nonnegative_ssim`	bool	`False`	Clamp negative values to 0

`SSIM` Module

from fussim import SSIM

module = SSIM(data_range=1.0)
ssim_val = module(pred, target)
loss = 1 - ssim_val
loss.backward()

Utility Functions

from fussim import get_build_info, check_compatibility

# Check installation details
info = get_build_info()
print(info)  # {'version': '0.3.15', 'runtime_torch_version': '2.10.0', ...}

# Verify compatibility
compatible, issues = check_compatibility()

Performance

RTX 4090, batch 5x5x1080x1920, 100 iterations:

Implementation	Forward	Backward	Total	Speedup
pytorch-msssim	28.7 ms	28.9 ms	57.5 ms	1.0x
fussim	4.38 ms	4.66 ms	9.04 ms	6.4x

Memory: Fused kernels avoid intermediate allocations, reducing VRAM usage compared to unfused implementations.

Troubleshooting

ImportError: No compatible fussim CUDA extension found

This usually means your PyTorch version is too old for the fat wheel.

Check your versions:

import torch
print(f"PyTorch: {torch.__version__}, CUDA: {torch.version.cuda}")

Solutions:

PyTorch Version	Solution
2.6 - 2.10	Should work. Run `pip install --upgrade fussim`
2.5 or older	Use version-specific wheel or upgrade PyTorch

DLL load failed / undefined symbol

This is a PyTorch ABI mismatch. The extension was built with a different PyTorch version.

Fix: Install a version-specific wheel that matches your exact PyTorch version:

# Check your version first
python -c "import torch; print(f'PyTorch: {torch.__version__}, CUDA: {torch.version.cuda}')"

# Install matching wheel (example for PyTorch 2.7.1 + CUDA 11.8)
pip install "fussim==0.3.15+pt27cu118" --extra-index-url https://opsiclear.github.io/fussim/whl/

CUDA extension not loading

Check your installation:

python -c "import fussim; print(fussim.get_build_info())"

Or use the compatibility check:

from fussim import check_compatibility
compatible, issues = check_compatibility()
print(f"Compatible: {compatible}")
print(f"Issues: {issues}")

Non-standard CUDA version (conda-forge, custom builds)

If your CUDA version doesn't exactly match a pre-built variant (e.g., CUDA 12.5, 12.7, 12.9 from conda-forge), fussim automatically picks the nearest compatible lower variant within the same major version:

CUDA 12.9 → uses cu128    CUDA 12.5 → uses cu124    CUDA 13.1 → uses cu130

If no compatible variant exists for your CUDA major version, build from source:

pip install fussim --no-binary fussim

Wrong CUDA version detected

The fat wheel auto-detects from torch.version.cuda. If a fallback warning appears:

import torch
print(torch.version.cuda)  # Check PyTorch's CUDA version

Install a version-specific wheel for exact matching.

Windows build errors with PyTorch 2.9.x

PyTorch 2.9.x has a Windows compilation bug that prevents building extensions from source.

Note: Pre-built wheels work fine on Windows with PyTorch 2.9.x. This only affects building from source.

Limitations

Constraint	Reason
PyTorch 2.6+ (fat wheel)	ABI compatibility; use version-specific wheels for 2.5
NVIDIA GPU required	No CPU fallback
`window_size`: 7, 9, or 11 only	CUDA kernel templates
`win_sigma`: 1.5 (fixed)	Hardcoded in optimized kernel
Custom `win` not supported	Uses built-in Gaussian
No MS-SSIM	Single-scale SSIM only

Attribution

optimized-fused-ssim by Janusch Patas (Taming3DGS)
fused-ssim by Rahul Goel

Citation

@software{optimized-fused-ssim,
    author = {Janusch Patas},
    title = {Optimized Fused-SSIM},
    year = {2025},
    url = {https://github.com/MrNeRF/optimized-fused-ssim},
}

License

MIT License - see LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
.github/workflows		.github/workflows
benchmarks		benchmarks
csrc		csrc
docs		docs
fussim		fussim
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
build_wheel.bat		build_wheel.bat
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

fussim

Quick Start

Installation

Recommended: Fat Wheel (auto-detection)

Alternative: Version-Specific Wheels

Build from Source

GPU Support

API Reference

`fused_ssim`

`ssim` (pytorch-msssim compatible)

`SSIM` Module

Utility Functions

Performance

Troubleshooting

Limitations

Attribution

Citation

License

About

Uh oh!

Releases 16

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

fussim

Quick Start

Installation

Recommended: Fat Wheel (auto-detection)

Alternative: Version-Specific Wheels

Build from Source

GPU Support

API Reference

fused_ssim

ssim (pytorch-msssim compatible)

SSIM Module

Utility Functions

Performance

Troubleshooting

Limitations

Attribution

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 16

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`fused_ssim`

`ssim` (pytorch-msssim compatible)

`SSIM` Module

Packages