Hi,
I'm trying to find the part of the code that attempts to compute the Wasserstein distance between prior and posterior (as in Eq. 5 in your ICLR paper), but couldn't find it. Would you please point to the part of the code for this distance?
Moreover, I found that the latent variables are computed directly from the model (e.g., a fully connected layer) rather than predicting \mu and \sigma and then sampling from that distribution, as stated in Eq. 3 and Eq. 4. Would you please clarify this?
Thanks