-
Notifications
You must be signed in to change notification settings - Fork 6.3k
[Modular] Qwen #12220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[Modular] Qwen #12220
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
@@ -539,8 +540,11 @@ class AutoPipelineBlocks(ModularPipelineBlocks): | |||
|
|||
def __init__(self): | |||
sub_blocks = InsertableDict() | |||
for block_name, block_cls in zip(self.block_names, self.block_classes): | |||
sub_blocks[block_name] = block_cls() | |||
for block_name, block in zip(self.block_names, self.block_classes): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I started to make dynamic blocks that can be configured during __init__
a simple made-up example would be
inpaint_vae_encoder = DynamicVaeEncoder(input_name="mask_image, output_name = "mask_image_latents")
vae_encoder = DynamicVaeEncoder(input_name="image", output_name="image_latents")
the first one takes input mask_image
and return intermediate_outputs mask_image_latents
the second one takes image
and return intermediate_outputs image_latents
this change is to support this use case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice.
(nit): Maybe we can have a comment explaining this because otherwise, this pattern seems a bit concerning:
if inspect.isclass(block):
sub_blocks[block_name] = block()
else:
sub_blocks[block_name] = block
# Prepare Latents steps | ||
|
||
|
||
class QwenImagePackLatentsDynamicStep(ModularPipelineBlocks): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
example of a dynamic block, this one can be used to pack different latents: image_latents, control_image_latents, noise etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
@@ -298,4 +307,5 @@ def _skip_attention___ret___hidden_states___encoder_hidden_states(self, *args, * | |||
_skip_proc_output_fn_Attention_WanAttnProcessor2_0 = _skip_attention___ret___hidden_states | |||
# not sure what this is yet. | |||
_skip_proc_output_fn_Attention_FluxAttnProcessor = _skip_attention___ret___hidden_states | |||
_skip_proc_output_fn_Attention_QwenDoubleStreamAttnProcessor2_0 = _skip_attention___ret___hidden_states |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my understanding. This one is for?
@@ -838,6 +838,134 @@ def apply_overlay( | |||
return image | |||
|
|||
|
|||
class InpaintProcessor(ConfigMixin): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really nice!
(not for this PR, we could attempt to have an example of the processor for an inpaint pipeline)
@@ -539,8 +540,11 @@ class AutoPipelineBlocks(ModularPipelineBlocks): | |||
|
|||
def __init__(self): | |||
sub_blocks = InsertableDict() | |||
for block_name, block_cls in zip(self.block_names, self.block_classes): | |||
sub_blocks[block_name] = block_cls() | |||
for block_name, block in zip(self.block_names, self.block_classes): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice.
(nit): Maybe we can have a comment explaining this because otherwise, this pattern seems a bit concerning:
if inspect.isclass(block):
sub_blocks[block_name] = block()
else:
sub_blocks[block_name] = block
|
||
final_batch_size = block_state.batch_size * block_state.num_images_per_prompt | ||
|
||
for input_name in self._latents_input_names: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clean ❤️
|
||
@property | ||
def description(self) -> str: | ||
return "Step that patchifies latents and expands batch dimension. Works with outputs from QwenImageVaeEncoderDynamicStep." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit): would add a line to comment on the dynamic nature of this block.
return latents | ||
|
||
|
||
class QwenImageDecodeDynamicStep(ModularPipelineBlocks): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the decoding step need to be dynamic? 👀
ComponentSpec( | ||
"guider", | ||
ClassifierFreeGuidance, | ||
config=FrozenDict({"guidance_scale": 4.0}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the QwenImage pipeline, guidance_scale
is akin to the one we have in Flux. However, I think we want to enable CFG with this which is done through true_cfg_scale
. Should this be taken into account?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good questions,
the true_cfg_scale
in flux/qwen is actually just guidance_scale
in every other pipeline - it is part of guider and should be set in guider
we had to use a different name (true_cfg_scale
) for flux because guidance_scale
was already taken to use as an input for distilled model. I think it would have been a lot better if we had gave the distilled guidance a different name so that we can keep the definition of guidance_scale
consistent across all pipelines
I'd like to fix it here in modular. IMO It won't confuse user too much because they won't be able to use guidance_scale
or true_cfg_scale
during runtime in modular as it is, so they will have to take some time to figure out how to use guidance properly and we will have chance to explain.
cc @DN6 @asomoza too, let me know if you have any thoughts around this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would have been a lot better if we had gave the distilled guidance a different name so that we can keep the definition of guidance_scale consistent across all pipelines
I like this point a lot! However, we have guidance_scale
in Flux (without the use of the Guider
component):
InputParam("guidance_scale", default=3.5), |
Maybe we could change that to something better suited (something like distilled_guidance_scale
). This way, we can keep the meaning of guidance_scale
consistent across the pipelines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I completely agree, let's keep the guidance_scale
consistent and use a different one for the distilled
models.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the proposal is that guidance scale would always imply CFG guidance scale?
I would argue that keeping guidance_scale
for all guidance methods makes sense since it implies how large of a step you want take in the guidance direction.
Alternatively we could introduce the concept of a DistilledGuidance
guider which is effectively a no-op and it makes it more explicit about exactly what's happening with latents rather than having to introduce new scale parameters, internal checks for negative embeds or checks like self._is_cfg_enabled
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so ideally, we should always use the same for all the models, which is to just consider the guidance_scale
for using or not cfg
, if you provide or not a negative prompt
shouldn't be a condition for using it. This is true for almost all the pipelines except Flux and QwenImage, but with modular, we should keep the API consistent. That's why is also ideal to use a separate guidance
for the distilled models.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the challenge here is the use case when both true CFG and distilled guidance are used and potentially with different scale (currently allowed in our pipelines)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this use case is not meaningful (even though we have been functionally allowing)
and yea than I think we will be able to keep the same parameter for both if we only need to use one or another
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so ideally, we should always use the same for all the models, which is to just consider the guidance_scale for using or not cfg, if you provide or not a negative prompt shouldn't be a condition for using it. This is true for almost all the pipelines except Flux and QwenImage, but with modular, we should keep the API consistent. That's why is also ideal to use a separate guidance for the distilled models.
Hmm so then perhaps introduce DistilledGuidance Guider? It's very clear then what is being applied to the model? And if you want to swap out the method, just change the Guider and we don't have to introduce an additional parameter term? And like @yiyixuxu said, doing both Distilled Guidance and CFG Guidance simultaneously is probably not doing anything meaningful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have seen people using both at the same time though, both CFG and distilled guidance with different scale
cc @linoytsaban to confirm & educate us a bit, since she is the OG of true_cfg
Will there be any features added soon for training Qwen models with HF libraries? What are the major barriers right now to making this happen? |
This is not the right PR to discuss Qwen training. You can check out https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/README_qwen.md as well as https://github.com/ostris/ai-toolkit for training with HF libs. |
Qwen-Image
Test Script for Qwen-Image Auto Pipeline
QwenImage Edit
Test script for QwenImage-Edit in Modular
Load from standard repo