Stream-DiffVSR

Diffusion-based video super-resolution that runs on your Mac.

Stream-DiffVSR upscales low-resolution video frames using a diffusion model optimized for Apple Silicon. It runs on MPS (Metal Performance Shaders) with attention slicing and VAE chunking to fit within unified memory. Feed it 360p frames, get back 720p or higher. No NVIDIA GPU required.

Why This Exists

The original Stream-DiffVSR paper introduced a strong approach to video super-resolution using auto-regressive diffusion. But the code only ran on NVIDIA GPUs with CUDA. Every hardcoded device reference, every xFormers dependency, every CUDA-specific memory call made it impossible to run on a Mac.

This fork strips all of that out. It replaces CUDA-only code with device-agnostic PyTorch, adds MPS support through attention slicing and VAE slicing, and cuts 50+ NVIDIA-specific dependencies. The result: a video super-resolution pipeline that runs natively on any Apple Silicon Mac.

Before vs After

Input (Low Resolution)	Output (Super-Resolved)
360p / 540p source frame	720p / 1080p upscaled frame
Blurry, compression artifacts visible	Sharp edges, recovered detail
Blocky textures in motion	Temporally coherent across frames

The model upscales video frames by 4x while maintaining temporal consistency between frames. Unlike single-image upscalers, Stream-DiffVSR uses information from previous frames to produce stable, flicker-free results.

Install

Apple Silicon (Recommended)

git clone https://github.com/199-biotechnologies/stream-diffvsr.git
cd stream-diffvsr

python3 -m venv venv
source venv/bin/activate

pip install -r requirements-mac.txt

Requirements: macOS 12.3+, Python 3.9+, Apple Silicon (M1/M2/M3/M4), 16GB+ RAM recommended.

Linux / NVIDIA GPU

git clone https://github.com/199-biotechnologies/stream-diffvsr.git
cd stream-diffvsr

conda env create -f requirements.yml
conda activate stream-diffvsr

Quick Start

python inference.py \
    --model_id 'Jamichsu/Stream-DiffVSR' \
    --in_path './input/' \
    --out_path './output/' \
    --num_inference_steps 4

The script auto-detects MPS on Mac, CUDA on NVIDIA, or falls back to CPU. Pretrained weights download automatically from Hugging Face.

Force a specific device

python inference.py --device mps --in_path ./input/ --out_path ./output/
python inference.py --device cpu --in_path ./input/ --out_path ./output/
python inference.py --device cuda --in_path ./input/ --out_path ./output/

Input format

Organize frames as sequential PNGs inside subdirectories:

input/
  video1/
    frame_0001.png
    frame_0002.png
    ...
  video2/
    frame_0001.png
    ...

How It Works

Stream-DiffVSR operates as a causal diffusion model. It only looks at past frames, never future ones. This makes it suitable for streaming and real-time scenarios where you cannot buffer ahead.

The pipeline has three core components:

4-step distilled denoiser -- A diffusion model distilled down to 4 denoising steps (from the typical 20-50), cutting inference time dramatically.
Auto-regressive Temporal Guidance (ARTG) -- Injects motion-aligned cues from previous frames during latent denoising. This is what keeps the output temporally stable.
Temporal Processor Module (TPM) -- A lightweight decoder add-on that enhances fine detail while preserving coherence across frames.

On Mac, the pipeline uses attention slicing (reduces memory ~40%) and VAE slicing (processes the decoder in chunks) to fit within unified memory constraints.

Features

What	Detail
Device auto-detection	MPS on Mac, CUDA on NVIDIA, CPU fallback
Attention slicing	40% memory reduction on unified memory
VAE slicing	Processes decoder in chunks for large frames
4-step inference	Distilled from 50 steps with minimal quality loss
Temporal coherence	No flicker between consecutive frames
Causal architecture	No future-frame dependency, works for streaming
TensorRT support	Optional acceleration on NVIDIA GPUs
Auto model download	Weights fetched from Hugging Face on first run

Performance

Device	Chip	RAM	720p Frame Time	Notes
Mac	M1	8GB	~4-6s	May need lower resolution
Mac	M1 Pro/Max	16-32GB	~2-4s	Good for 720p
Mac	M2/M3 Pro/Max	32-64GB	~1-3s	Comfortable
Mac	M3 Ultra	64-192GB	~0.8-1.5s	Best Mac performance
NVIDIA	RTX 4090	24GB	0.328s	Original benchmark
NVIDIA	RTX 3080	10GB	~0.5-0.8s	With xFormers

Troubleshooting MPS

Out of memory: Set PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 before running inference, close other apps, or process at 540p instead of 720p.

Unsupported operations: Set PYTORCH_ENABLE_MPS_FALLBACK=1 to let PyTorch fall back to CPU for any MPS-unsupported ops.

Citation

If you use this work, cite the original paper:

@article{shiu2025stream,
  title={Stream-DiffVSR: Low-Latency Streamable Video Super-Resolution via Auto-Regressive Diffusion},
  author={Shiu, Hau-Shiang and Lin, Chin-Yang and Wang, Zhixiang and Hsiao, Chi-Wei and Yu, Po-Fan and Chen, Yu-Chih and Liu, Yu-Lun},
  journal={arXiv preprint arXiv:2512.23709},
  year={2025}
}

Contributing

Contributions are welcome. See CONTRIBUTING.md for guidelines.

License

Apache 2.0. See LICENSE for the full text.

Original implementation by jamichss. Built on HuggingFace Diffusers, StreamDiffusion, StableVSR, and TAESD.

Built by Boris Djordjevic at 199 Biotechnologies | Paperfoot AI

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
acceleration		acceleration
pipeline		pipeline
scheduler		scheduler
scripts		scripts
temporal_autoencoder		temporal_autoencoder
util		util
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
inference.py		inference.py
push2hub.py		push2hub.py
requirements-mac.txt		requirements-mac.txt
requirements.yml		requirements.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Stream-DiffVSR

Why This Exists

Before vs After

Install

Apple Silicon (Recommended)

Linux / NVIDIA GPU

Quick Start

Force a specific device

Input format

How It Works

Features

Performance

Troubleshooting MPS

Citation

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Stream-DiffVSR

Why This Exists

Before vs After

Install

Apple Silicon (Recommended)

Linux / NVIDIA GPU

Quick Start

Force a specific device

Input format

How It Works

Features

Performance

Troubleshooting MPS

Citation

Contributing

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages