Why 20B models need fewer steps than smaller ones?

Hi authors,

Thanks for the great work on TwinFlow! I was looking at the training configs in Table 6 and noticed something interesting that I'm curious about.

According to the table:

- Qwen-Image (20B) only needs 3,000 to 10,000 steps.
- SANA (0.6B / 1.6B) needs 30,000 steps.
- OpenUni needs even more (60,000 steps).

It’s a bit counter-intuitive that the 20B model converges so much faster than the smaller ones. Is this because the 20B model already has such a strong pre-trained base that "straightening the flow" is easier for it? Or is there another reason why the larger model is so much more "sample efficient" in this framework?

Would love to hear your thoughts on this!

BTW, could you also share the training configuration used for TwinFlow Z-Image?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why 20B models need fewer steps than smaller ones? #27

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Why 20B models need fewer steps than smaller ones? #27

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions