Thanks for sharing such an amazing work :)
In the last section of the notebook Stable Diffusion Deep Dive.ipynb, you mention:
NB: We should set latents requires_grad=True before we do the forward pass of the unet (removing with torch.no_grad()) if we want mode accurate gradients. BUT this requires a lot of extra memory. You'll see both approaches used depending on whose implementation you're looking at.
Can you please clarify what is the difference between the two approaches? For example, if I had to code this, I would have used torch.no_grad(), but apparently you preferred another approach. What does it change computationally and results-wise?.
I think adding this as extra info to the notebook would be useful to others, too :)