Hello, thank you for publishing such an interesting work.
I have a question regarding your loss functions. As I understand you have 4 loss functions (2 from DMD approach and 2 from adversarial training). But it is unclear how you combine them and which modules you update. In the original DMD paper DMD loss is used to update the generator and the fake score prediction model is updated by minimizing a standard denoising objective. But you introduce two more loss functions $L_D, L_G$ and say "Notably, the adversarial loss is used only to update the discriminator head D, while the fake score network $μ_{fake}$ is updated solely with the DMD loss". Which of the two losses ($L_G, L_D$) you call "adversarial" (earlier you called both of them "adversarial objectives")? I assume it's $L_D$. Also, what do you mean by "... while the fake score network $μ_{fake}$ is updated solely with the DMD loss"? In the original DMD approach the fake scorer is updated by the standard diffusion loss, you even say it yourself: "The fake score network $μ_{fake}^{\phi}$ is trained with the standard diffusion loss on student-generated videos". Is there a typo? Does that mean that the generator is updated with a combined loss $L_{DMD} + L_G$? If so, shouldn't there be some weight coefficients?
I am very interested in your work but this training pipeline is unclear to me. Hope you will clarify it.