Text, image, audio, video unified into a single diffusion model with 1 latent space.
inspired by https://arxiv.org/abs/2510.13721 - NExT-OMNI: Towards Any-to-Any Omnimodal Foundation Models with Discrete Flow Matching
| Name | Name | Last commit date | ||
|---|---|---|---|---|
Text, image, audio, video unified into a single diffusion model with 1 latent space.
inspired by https://arxiv.org/abs/2510.13721 - NExT-OMNI: Towards Any-to-Any Omnimodal Foundation Models with Discrete Flow Matching