What exactly is NELBO and why do we optimize it?

Can someone tell me why we optimize NELBO? In the paper it only said "We optimize the ELBO with respect to the variational parameters." As far as I understand it D-ETM consists of three neural networks to find the distributions for theta, eta and alpha and then estimates KL divergences for them. And then the KL divergence values are simply added together and optimized jointly? But why is NLL added? And I thought that "Solving this optimization problem is equivalent to maximizing the evidence lower bound (ELBO)" would mean that we don't minimize it as a loss which the model seems to do but rather maximize it. 

Sorry, I am pretty confused (I am rather new to Bayesian statistics and variational inference)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What exactly is NELBO and why do we optimize it? #8

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

What exactly is NELBO and why do we optimize it? #8

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions