Skip to content

Conversation

stevhliu
Copy link
Member

Splits off the Models section from Load schedulers and models and creates a dedicated section for models to include device placement, torch dtype, AutoModel API, and saving as shards.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@stevhliu stevhliu requested a review from sayakpaul August 28, 2025 22:21
Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Left some comments, LMK if they are unclear.

| `"cuda"` | places model or pipeline on CUDA device |
| `"balanced"` | evenly distributes model or pipeline on all GPUs |
| `"auto"` | distribute model from fastest device first to slowest |
| `"cuda"` | places pipeline on CUDA device |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"cuda" is just an example. If someone wants to do it for any other supported accelerator, I believe they pass it by their name 👀

Suggested change
| `"cuda"` | places pipeline on CUDA device |
| `"cuda"` | places pipeline on CUDA (or supported accelerator) device |

Comment on lines +40 to +43
model = AutoModel.from_pretrained(
"Qwen/Qwen-Image",
subfolder="transformer"
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
model = AutoModel.from_pretrained(
"Qwen/Qwen-Image",
subfolder="transformer"
)
model = AutoModel.from_pretrained(
"Qwen/Qwen-Image", subfolder="transformer"
)

Comment on lines +55 to +57
"Qwen/Qwen-Image",
subfolder="transformer"
torch_dtype=torch.float16
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Qwen/Qwen-Image",
subfolder="transformer"
torch_dtype=torch.float16
"Qwen/Qwen-Image",
subfolder="transformer",
torch_dtype=torch.bfloat16

)
```

[torch.Tensor.to](https://docs.pytorch.org/docs/stable/generated/torch.Tensor.to.html) can also convert to a specific data type on the fly. However, it converts *all* weights to the requested data type unlike `torch_dtype` which respects `_keep_in_fp32_modules`. This argument preserves layers in `torch.float32` for numerical stability and best generation quality (see example [_keep_in_fp32_modules](https://github.com/huggingface/diffusers/blob/f864a9a352fa4a220d860bfdd1782e3e5af96382/src/diffusers/models/transformers/transformer_wan.py#L374))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it be nn.Module.to()?

from diffusers import QwenImageTransformer2DModel

model = QwenImageTransformer2DModel.from_pretrained(
"Qwen/Qwen-Image",,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Qwen/Qwen-Image",,
"Qwen/Qwen-Image",

import torch
from diffusers import QwenImageTransformer2DModel

max_memory = {0: "16GB", 1: "16GB"}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Umm, what would 0 and 1 denote in this case, though? I think this form of max_memory dict is reserved for the pipelines.

For models, you probably want to specify module names (regex should work, too). Cc: @SunMarc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants