Hi authors,
Thanks for the great work on TwinFlow! I was looking at the training configs in Table 6 and noticed something interesting that I'm curious about.
According to the table:
- Qwen-Image (20B) only needs 3,000 to 10,000 steps.
- SANA (0.6B / 1.6B) needs 30,000 steps.
- OpenUni needs even more (60,000 steps).
It’s a bit counter-intuitive that the 20B model converges so much faster than the smaller ones. Is this because the 20B model already has such a strong pre-trained base that "straightening the flow" is easier for it? Or is there another reason why the larger model is so much more "sample efficient" in this framework?
Would love to hear your thoughts on this!
BTW, could you also share the training configuration used for TwinFlow Z-Image?
Hi authors,
Thanks for the great work on TwinFlow! I was looking at the training configs in Table 6 and noticed something interesting that I'm curious about.
According to the table:
It’s a bit counter-intuitive that the 20B model converges so much faster than the smaller ones. Is this because the 20B model already has such a strong pre-trained base that "straightening the flow" is easier for it? Or is there another reason why the larger model is so much more "sample efficient" in this framework?
Would love to hear your thoughts on this!
BTW, could you also share the training configuration used for TwinFlow Z-Image?