Skip to content

Why 20B models need fewer steps than smaller ones? #27

@vitrun

Description

@vitrun

Hi authors,

Thanks for the great work on TwinFlow! I was looking at the training configs in Table 6 and noticed something interesting that I'm curious about.

According to the table:

  • Qwen-Image (20B) only needs 3,000 to 10,000 steps.
  • SANA (0.6B / 1.6B) needs 30,000 steps.
  • OpenUni needs even more (60,000 steps).

It’s a bit counter-intuitive that the 20B model converges so much faster than the smaller ones. Is this because the 20B model already has such a strong pre-trained base that "straightening the flow" is easier for it? Or is there another reason why the larger model is so much more "sample efficient" in this framework?

Would love to hear your thoughts on this!

BTW, could you also share the training configuration used for TwinFlow Z-Image?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions