See relevant overleaf section for toy example.
One thought:
A known challenge in image generation, specifically for models like variational autoencoders, is the characteristic of "posterior collapse" learning an average that prevents generation of images that belong in very different modes. This is evidenced by VAEs often "blurring" the generated image. I think we should be able to use hypernets with VAE to create more diverse sampling space and should prevent posterior collapse.