A Daydream Scope plugin that adds the Causal Forcing pipeline for real-time streaming video generation.
Real-time streaming generation with Causal Forcing on Wan2.1-1.3B (17 FPS on H100, ~10 FPS on RTX 4090). Video from the Causal Forcing project page.
Causal Forcing (Tsinghua / Shengshu / UT Austin, Feb 2026) is the successor to Self-Forcing -- the training method behind Scope's built-in LongLive pipeline. It fixes a theoretical flaw in Self-Forcing's ODE initialization by using an autoregressive teacher instead of a bidirectional one, producing strictly better video quality at identical inference speed.
Improvements over Self-Forcing (same GPU, same FPS):
| Metric | Improvement |
|---|---|
| Dynamic Degree | +19.3% |
| VisionReward | +8.7% |
| Instruction Following | +16.7% |
Both models run on Wan2.1-T2V-1.3B with 4-step denoising and KV-cached autoregressive generation.
- NVIDIA GPU with 20+ GB VRAM (RTX 3090 / 4090 / 5090 or cloud equivalent)
- Daydream Scope installed
uv pip install git+https://github.com/daydreamlive/scope-causal-forcing.gitOr for local development:
git clone https://github.com/daydreamlive/scope-causal-forcing.git
cd scope-causal-forcing
uv pip install -e .The plugin registers automatically via entry points — restart Scope and causal-forcing will appear in the pipeline selector.
On first load, Scope will download the required model weights:
| Model | Source | Size |
|---|---|---|
| Wan2.1-T2V-1.3B | Wan-AI/Wan2.1-T2V-1.3B | ~3 GB |
| UMT5-XXL encoder | google/umt5-xxl | ~10 GB |
| Causal Forcing checkpoint | zhuhz22/Causal-Forcing | ~3 GB |
| Wan2.1 VAE | (bundled with Wan2.1-T2V-1.3B) | ~300 MB |
If you already use LongLive in Scope, the base Wan2.1 and UMT5 weights are shared — only the Causal Forcing checkpoint is an additional download.
| Parameter | Default | Description |
|---|---|---|
height |
480 | Output height in pixels |
width |
832 | Output width in pixels |
denoising_steps |
[1000, 750, 500, 250] | 4-step warped denoising schedule |
vae_type |
wan | Full VAE (wan) or 75% pruned (lightvae) |
base_seed |
42 | Random seed for reproducibility |
Each frame is generated autoregressively with KV caching:
- Denoise — 4-step spatial denoising loop (flow matching with warped timesteps)
- Cache — Re-run at timestep=0 to write clean context into KV cache
- Decode — Wan VAE decodes latent to pixels with temporal caching
- Advance — Move to next frame
This is the same inference architecture as LongLive/Self-Forcing — only the weights differ.
- Paper: Causal Forcing: Autoregressive Distillation via Causal ODE
- Code: github.com/thu-ml/Causal-Forcing
- Weights: huggingface.co/zhuhz22/Causal-Forcing
Apache-2.0 (same as the Causal Forcing model weights)
