Difference between `latents requires_grad=True` and `torch.no_grad()`

Thanks for sharing such an amazing work :) 

In the last section of the notebook [Stable Diffusion Deep Dive.ipynb](https://github.com/fastai/diffusion-nbs/blob/master/Stable%20Diffusion%20Deep%20Dive.ipynb), you mention:

> NB: We should set latents requires_grad=True before we do the forward pass of the unet (removing with torch.no_grad()) if we want mode accurate gradients. BUT this requires a lot of extra memory. You'll see both approaches used depending on whose implementation you're looking at.

Can you please clarify what is the difference between the two approaches? For example, if I had to code this, I would have used `torch.no_grad()`, but apparently you preferred another approach. What does it change **computationally** and **results**-wise?.

I think adding this as extra info to the notebook would be useful to others, too :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Difference between `latents requires_grad=True` and `torch.no_grad()` #34

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Difference between latents requires_grad=True and torch.no_grad() #34

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Difference between `latents requires_grad=True` and `torch.no_grad()` #34