Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
src/diffusers/pipeline_utils.py
Outdated
| if isinstance(module, torch.nn.Module): | ||
| if module.device == torch.device("meta"): | ||
| return torch.device("cpu") | ||
| return torch.device("cuda" if torch.cuda.is_available() else "cpu") |
There was a problem hiding this comment.
@patrickvonplaten @piEsposito this feels hacky, but is required to make the pipelines work when self.device is not the same as e.g. generator after offloading.
See the error here: https://github.com/huggingface/diffusers/actions/runs/3410777950/jobs/5674151054#step:10:551
Seems that accelerate doesn't populate param_original_devices here, so the only way to know where the model was supposed to go is to guess?
There was a problem hiding this comment.
I understand - and actually think this is a clever solution, I've learned a few thing with this PR of yours.
Also, IMO it is correct to assume that, is a user has a GPU, then they will use it instead of CPU for Diffusion models.
There was a problem hiding this comment.
Actually the more I think about it, wouldn't the cleanest solution be just to return torch.device("meta") and then fix the bugs in the pipelines directly.
Bit worried about making such a fundamental function this hacky.
Also cc @pcuenca - curious to hear your thoughts!
| # make sure that less than 2.2 GB is allocated | ||
| assert mem_bytes < 2.2 * 10**9 |
There was a problem hiding this comment.
@piEsposito the 768x512 images require ~2.16GB of memory, compared to 1.5 for the 512x512 text2img tests.
There was a problem hiding this comment.
Yeah, thank you for catching that!
piEsposito
left a comment
There was a problem hiding this comment.
I agree with the approach, thank you for teaching me those few things.
| # make sure that less than 2.2 GB is allocated | ||
| assert mem_bytes < 2.2 * 10**9 |
There was a problem hiding this comment.
Yeah, thank you for catching that!
src/diffusers/pipeline_utils.py
Outdated
| if isinstance(module, torch.nn.Module): | ||
| if module.device == torch.device("meta"): | ||
| return torch.device("cpu") | ||
| return torch.device("cuda" if torch.cuda.is_available() else "cpu") |
There was a problem hiding this comment.
I understand - and actually think this is a clever solution, I've learned a few thing with this PR of yours.
Also, IMO it is correct to assume that, is a user has a GPU, then they will use it instead of CPU for Diffusion models.
|
@piEsposito thank you for contributing the offloading solution too! 🤗 |
|
If we decide to return But @piEsposito (cc @sgugger) maybe there's still a way to access the intended execution device after cpu offloading? |
|
The execution device will be attached to the bottom-level module, not the top-level one. |
|
@anton-l cc @sgugger the execution device appears as If you try |
That seems to only work when |
| # make sure that less than 1.5 GB is allocated | ||
| assert mem_bytes < 1.5 * 10**9 | ||
| # make sure that less than 2.8 GB is allocated | ||
| assert mem_bytes < 2.8 * 10**9 |
There was a problem hiding this comment.
Not sure how this increase happened yet, if someone can check mem_bytes here on their machine that would be great :)
* Fix cpu offloading * get offloaded devices locally for SD pipelines
Fixes the implementation and tests introduced in #1085
Looks like the two
test_stable_diffusion_pipeline_with_sequential_cpu_offloadingtests weren't checked with a gpu originally, which resulted in device mismatch: https://github.com/huggingface/diffusers/actions/runs/3410777950/jobs/5674151054#step:10:551@piEsposito FYI: github actions for the GPU tests aren't launched for PRs, so for future PRs please check them locally too :)