This is an excellent work. Does the kimi_vl_moe_a3b support zero-3 training?
This is an excellent work. Does the kimi_vl_moe_a3b support zero-3 training?