Skip to content

vaskers5/LUA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Moonstep logo One Small Step in Latent, One Giant Leap for Pixels:
Fast Latent Upscale Adapter for Your Diffusion Models

Project Page arXiv HuggingFace Paper HuggingFace Weights YouTube Demo

This repository contains the official implementation of the paper "One Small Step in Latent, One Giant Leap for Pixels: Fast Latent Upscale Adapter for Your Diffusion Models".

We present the Latent Upscaler Adapter (LUA), a lightweight module that performs super-resolution directly on the generator's latent code before the final VAE decoding step. LUA integrates as a drop-in component, requiring no modifications to the base model or additional diffusion stages. It enables high-resolution synthesis through a single feed-forward pass in latent space, achieving comparable perceptual quality to pixel-space methods while reducing decoding and upscaling time.

Teaser

Installation

git clone https://github.com/vaskers5/LUA.git
cd LUA
pip install -r requirements.txt

Quick Start

LUA weights are hosted on HuggingFace and downloaded automatically on first use.

Python API

import torch
from diffusers import FluxPipeline
from lua import load_model, upscale_latent

# Load models
pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16)
pipe.to("cuda")
pipe.vae.enable_tiling()

lua_model = load_model(device="cuda")  # auto-downloads weights from HF

# Generate base latent at 1024x1024
result = pipe("a cat astronaut", output_type="latent", width=1024, height=1024)

# Unpack to VAE space
latent = pipe._unpack_latents(result.images, 1024, 1024, pipe.vae_scale_factor)
latent = (latent / pipe.vae.config.scaling_factor) + pipe.vae.config.shift_factor

# Upscale x2 (1024 -> 2048) or x4 (1024 -> 4096)
upscaled = upscale_latent(lua_model, latent, head="x2")

# Decode to image
image = pipe.vae.decode(upscaled.to(torch.bfloat16), return_dict=False)[0]
image = pipe.image_processor.postprocess(image, output_type="pil")[0]
image.save("output_2k.png")

CLI Inference

# 2K image (1024 -> 2048)
python inference.py --prompt "a mountain landscape, cinematic" --head x2

# 4K image (1024 -> 4096)
python inference.py --prompt "a mountain landscape, cinematic" --head x4 --output landscape_4k.png

# Use a local checkpoint
python inference.py --prompt "hello" --weights ./my_checkpoint.pth --head x2

Gradio Demo

Interactive demo with side-by-side comparison against direct FLUX generation:

python gradio_demo.py

The demo compares LUA path (FLUX@1024 + LUA upscale) vs Direct path (FLUX@target) at the same output resolution, with interactive magnifying loupes and timing breakdowns.

You can configure the FLUX model via environment variables:

FLUX_MODEL_ID="black-forest-labs/FLUX.1-dev" python gradio_demo.py

Model Details

Architecture SwinIR-based transformer with multi-head upsampling
Parameters ~250M
Input 16-channel VAE latent (FLUX latent space)
Heads x2 (2x upscaling), x4 (4x upscaling)
Output Upscaled 16-channel VAE latent

LUA operates entirely in the latent space — it upscales the latent code before the VAE decoder, which means the expensive VAE decode only happens once at the target resolution.

Training

Training code will be released soon.

Citation

@article{razin2024lua,
  title={One Small Step in Latent, One Giant Leap for Pixels: Fast Latent Upscale Adapter for Your Diffusion Models},
  author={Razin, Aleksandr and Kazantsev, Danil and Makarov, Ilya},
  journal={arXiv preprint arXiv:2511.10629},
  year={2024}
}

License

This project is licensed under the Apache License 2.0 — see the LICENSE file for details.

About

One Small Step in Latent, One Giant Leap for Pixels: Fast Latent Upscale Adapter for Your Diffusion Model

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages