-
Notifications
You must be signed in to change notification settings - Fork 35
Open
Description
Hi,
I tried to load the 60M no tri model and encountered the error messages below.
There seems to be a misalignment between the config and the checkpoint.
RuntimeError: Error(s) in loading state_dict for Proteina:
size mismatch for nn.init_repr_factory.linear_out.weight: copying a param with shape torch.Size([512, 200]) from checkpoint, the shape in current model is torch.Size([512, 132]).
size mismatch for nn.cond_factory.feat_creators.1.embedding_C.weight: copying a param with shape torch.Size([6, 196]) from checkpoint, the shape in current model is torch.Size([6, 256]).
size mismatch for nn.cond_factory.feat_creators.1.embedding_A.weight: copying a param with shape torch.Size([44, 196]) from checkpoint, the shape in current model is torch.Size([44, 256]).
size mismatch for nn.cond_factory.feat_creators.1.embedding_T.weight: copying a param with shape torch.Size([1473, 196]) from checkpoint, the shape in current model is torch.Size([1473, 256]).
size mismatch for nn.cond_factory.linear_out.weight: copying a param with shape torch.Size([128, 784]) from checkpoint, the shape in current model is torch.Size([512, 1024]).
size mismatch for nn.transition_c_1.swish_linear.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([2048, 512]).
size mismatch for nn.transition_c_1.linear_out.weight: copying a param with shape torch.Size([128, 256]) from checkpoint, the shape in current model is torch.Size([512, 1024]).
size mismatch for nn.transition_c_2.swish_linear.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([2048, 512]).
size mismatch for nn.transition_c_2.linear_out.weight: copying a param with shape torch.Size([128, 256]) from checkpoint, the shape in current model is torch.Size([512, 1024]).
size mismatch for nn.pair_repr_builder.init_repr_factory.ln_out.weight: copying a param with shape torch.Size([196]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for nn.pair_repr_builder.init_repr_factory.ln_out.bias: copying a param with shape torch.Size([196]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for nn.pair_repr_builder.init_repr_factory.linear_out.weight: copying a param with shape torch.Size([196, 319]) from checkpoint, the shape in current model is torch.Size([256, 319]).
size mismatch for nn.pair_repr_builder.cond_factory.ln_out.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.pair_repr_builder.cond_factory.ln_out.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.pair_repr_builder.cond_factory.linear_out.weight: copying a param with shape torch.Size([128, 196]) from checkpoint, the shape in current model is torch.Size([512, 256]).
size mismatch for nn.pair_repr_builder.adaln.norm_cond.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.pair_repr_builder.adaln.norm_cond.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.pair_repr_builder.adaln.to_gamma.0.weight: copying a param with shape torch.Size([196, 128]) from checkpoint, the shape in current model is torch.Size([256, 512]).
size mismatch for nn.pair_repr_builder.adaln.to_gamma.0.bias: copying a param with shape torch.Size([196]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for nn.pair_repr_builder.adaln.to_beta.weight: copying a param with shape torch.Size([196, 128]) from checkpoint, the shape in current model is torch.Size([256, 512]).
size mismatch for nn.transformer_layers.0.mhba.adaln.norm_cond.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.0.mhba.adaln.norm_cond.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.0.mhba.adaln.to_gamma.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for nn.transformer_layers.0.mhba.adaln.to_beta.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for nn.transformer_layers.0.mhba.mha.to_qkv.weight: copying a param with shape torch.Size([1512, 512]) from checkpoint, the shape in current model is torch.Size([1536, 512]).
size mismatch for nn.transformer_layers.0.mhba.mha.to_qkv.bias: copying a param with shape torch.Size([1512]) from checkpoint, the shape in current model is torch.Size([1536]).
size mismatch for nn.transformer_layers.0.mhba.mha.to_g.weight: copying a param with shape torch.Size([504, 512]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for nn.transformer_layers.0.mhba.mha.to_g.bias: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.0.mhba.mha.to_out_node.weight: copying a param with shape torch.Size([512, 504]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for nn.transformer_layers.0.mhba.mha.q_layer_norm.weight: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.0.mhba.mha.q_layer_norm.bias: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.0.mhba.mha.k_layer_norm.weight: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.0.mhba.mha.k_layer_norm.bias: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.0.mhba.mha.to_bias.weight: copying a param with shape torch.Size([12, 196]) from checkpoint, the shape in current model is torch.Size([8, 256]).
size mismatch for nn.transformer_layers.0.mhba.mha.pair_norm.weight: copying a param with shape torch.Size([196]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for nn.transformer_layers.0.mhba.mha.pair_norm.bias: copying a param with shape torch.Size([196]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for nn.transformer_layers.0.mhba.scale_output.to_adaln_zero_gamma.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for nn.transformer_layers.0.transition.adaln.norm_cond.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.0.transition.adaln.norm_cond.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.0.transition.adaln.to_gamma.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for nn.transformer_layers.0.transition.adaln.to_beta.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for nn.transformer_layers.0.transition.scale_output.to_adaln_zero_gamma.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for nn.transformer_layers.1.mhba.adaln.norm_cond.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.1.mhba.adaln.norm_cond.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.1.mhba.adaln.to_gamma.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for nn.transformer_layers.1.mhba.adaln.to_beta.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for nn.transformer_layers.1.mhba.mha.to_qkv.weight: copying a param with shape torch.Size([1512, 512]) from checkpoint, the shape in current model is torch.Size([1536, 512]).
size mismatch for nn.transformer_layers.1.mhba.mha.to_qkv.bias: copying a param with shape torch.Size([1512]) from checkpoint, the shape in current model is torch.Size([1536]).
size mismatch for nn.transformer_layers.1.mhba.mha.to_g.weight: copying a param with shape torch.Size([504, 512]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for nn.transformer_layers.1.mhba.mha.to_g.bias: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.1.mhba.mha.to_out_node.weight: copying a param with shape torch.Size([512, 504]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for nn.transformer_layers.1.mhba.mha.q_layer_norm.weight: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.1.mhba.mha.q_layer_norm.bias: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.1.mhba.mha.k_layer_norm.weight: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.1.mhba.mha.k_layer_norm.bias: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.1.mhba.mha.to_bias.weight: copying a param with shape torch.Size([12, 196]) from checkpoint, the shape in current model is torch.Size([8, 256]).
size mismatch for nn.transformer_layers.1.mhba.mha.pair_norm.weight: copying a param with shape torch.Size([196]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for nn.transformer_layers.1.mhba.mha.pair_norm.bias: copying a param with shape torch.Size([196]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for nn.transformer_layers.1.mhba.scale_output.to_adaln_zero_gamma.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for nn.transformer_layers.1.transition.adaln.norm_cond.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.1.transition.adaln.norm_cond.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.1.transition.adaln.to_gamma.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for nn.transformer_layers.1.transition.adaln.to_beta.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for nn.transformer_layers.1.transition.scale_output.to_adaln_zero_gamma.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for nn.transformer_layers.2.mhba.adaln.norm_cond.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.2.mhba.adaln.norm_cond.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.2.mhba.adaln.to_gamma.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for nn.transformer_layers.2.mhba.adaln.to_beta.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for nn.transformer_layers.2.mhba.mha.to_qkv.weight: copying a param with shape torch.Size([1512, 512]) from checkpoint, the shape in current model is torch.Size([1536, 512]).
size mismatch for nn.transformer_layers.2.mhba.mha.to_qkv.bias: copying a param with shape torch.Size([1512]) from checkpoint, the shape in current model is torch.Size([1536]).
size mismatch for nn.transformer_layers.2.mhba.mha.to_g.weight: copying a param with shape torch.Size([504, 512]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for nn.transformer_layers.2.mhba.mha.to_g.bias: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.2.mhba.mha.to_out_node.weight: copying a param with shape torch.Size([512, 504]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for nn.transformer_layers.2.mhba.mha.q_layer_norm.weight: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.2.mhba.mha.q_layer_norm.bias: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.2.mhba.mha.k_layer_norm.weight: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.2.mhba.mha.k_layer_norm.bias: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.2.mhba.mha.to_bias.weight: copying a param with shape torch.Size([12, 196]) from checkpoint, the shape in current model is torch.Size([8, 256]).
size mismatch for nn.transformer_layers.2.mhba.mha.pair_norm.weight: copying a param with shape torch.Size([196]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for nn.transformer_layers.2.mhba.mha.pair_norm.bias: copying a param with shape torch.Size([196]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for nn.transformer_layers.2.mhba.scale_output.to_adaln_zero_gamma.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for nn.transformer_layers.2.transition.adaln.norm_cond.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.2.transition.adaln.norm_cond.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.2.transition.adaln.to_gamma.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for nn.transformer_layers.2.transition.adaln.to_beta.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for nn.transformer_layers.2.transition.scale_output.to_adaln_zero_gamma.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for nn.transformer_layers.3.mhba.adaln.norm_cond.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.3.mhba.adaln.norm_cond.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.3.mhba.adaln.to_gamma.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for nn.transformer_layers.3.mhba.adaln.to_beta.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for nn.transformer_layers.3.mhba.mha.to_qkv.weight: copying a param with shape torch.Size([1512, 512]) from checkpoint, the shape in current model is torch.Size([1536, 512]).
size mismatch for nn.transformer_layers.3.mhba.mha.to_qkv.bias: copying a param with shape torch.Size([1512]) from checkpoint, the shape in current model is torch.Size([1536]).
size mismatch for nn.transformer_layers.3.mhba.mha.to_g.weight: copying a param with shape torch.Size([504, 512]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for nn.transformer_layers.3.mhba.mha.to_g.bias: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.3.mhba.mha.to_out_node.weight: copying a param with shape torch.Size([512, 504]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for nn.transformer_layers.3.mhba.mha.q_layer_norm.weight: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.3.mhba.mha.q_layer_norm.bias: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.3.mhba.mha.k_layer_norm.weight: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.3.mhba.mha.k_layer_norm.bias: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.3.mhba.mha.to_bias.weight: copying a param with shape torch.Size([12, 196]) from checkpoint, the shape in current model is torch.Size([8, 256]).
size mismatch for nn.transformer_layers.3.mhba.mha.pair_norm.weight: copying a param with shape torch.Size([196]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for nn.transformer_layers.3.mhba.mha.pair_norm.bias: copying a param with shape torch.Size([196]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for nn.transformer_layers.3.mhba.scale_output.to_adaln_zero_gamma.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for nn.transformer_layers.3.transition.adaln.norm_cond.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.3.transition.adaln.norm_cond.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.3.transition.adaln.to_gamma.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for nn.transformer_layers.3.transition.adaln.to_beta.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for nn.transformer_layers.3.transition.scale_output.to_adaln_zero_gamma.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for nn.transformer_layers.4.mhba.adaln.norm_cond.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.4.mhba.adaln.norm_cond.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.4.mhba.adaln.to_gamma.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for nn.transformer_layers.4.mhba.adaln.to_beta.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for nn.transformer_layers.4.mhba.mha.to_qkv.weight: copying a param with shape torch.Size([1512, 512]) from checkpoint, the shape in current model is torch.Size([1536, 512]).
size mismatch for nn.transformer_layers.4.mhba.mha.to_qkv.bias: copying a param with shape torch.Size([1512]) from checkpoint, the shape in current model is torch.Size([1536]).
size mismatch for nn.transformer_layers.4.mhba.mha.to_g.weight: copying a param with shape torch.Size([504, 512]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for nn.transformer_layers.4.mhba.mha.to_g.bias: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.4.mhba.mha.to_out_node.weight: copying a param with shape torch.Size([512, 504]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for nn.transformer_layers.4.mhba.mha.q_layer_norm.weight: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.4.mhba.mha.q_layer_norm.bias: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.4.mhba.mha.k_layer_norm.weight: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.4.mhba.mha.k_layer_norm.bias: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.4.mhba.mha.to_bias.weight: copying a param with shape torch.Size([12, 196]) from checkpoint, the shape in current model is torch.Size([8, 256]).
size mismatch for nn.transformer_layers.4.mhba.mha.pair_norm.weight: copying a param with shape torch.Size([196]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for nn.transformer_layers.4.mhba.mha.pair_norm.bias: copying a param with shape torch.Size([196]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for nn.transformer_layers.4.mhba.scale_output.to_adaln_zero_gamma.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for nn.transformer_layers.4.transition.adaln.norm_cond.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.4.transition.adaln.norm_cond.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.4.transition.adaln.to_gamma.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for nn.transformer_layers.4.transition.adaln.to_beta.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for nn.transformer_layers.4.transition.scale_output.to_adaln_zero_gamma.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for nn.transformer_layers.5.mhba.adaln.norm_cond.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.5.mhba.adaln.norm_cond.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.5.mhba.adaln.to_gamma.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for nn.transformer_layers.5.mhba.adaln.to_beta.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for nn.transformer_layers.5.mhba.mha.to_qkv.weight: copying a param with shape torch.Size([1512, 512]) from checkpoint, the shape in current model is torch.Size([1536, 512]).
size mismatch for nn.transformer_layers.5.mhba.mha.to_qkv.bias: copying a param with shape torch.Size([1512]) from checkpoint, the shape in current model is torch.Size([1536]).
size mismatch for nn.transformer_layers.5.mhba.mha.to_g.weight: copying a param with shape torch.Size([504, 512]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for nn.transformer_layers.5.mhba.mha.to_g.bias: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.5.mhba.mha.to_out_node.weight: copying a param with shape torch.Size([512, 504]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for nn.transformer_layers.5.mhba.mha.q_layer_norm.weight: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.5.mhba.mha.q_layer_norm.bias: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.5.mhba.mha.k_layer_norm.weight: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.5.mhba.mha.k_layer_norm.bias: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.5.mhba.mha.to_bias.weight: copying a param with shape torch.Size([12, 196]) from checkpoint, the shape in current model is torch.Size([8, 256]).
size mismatch for nn.transformer_layers.5.mhba.mha.pair_norm.weight: copying a param with shape torch.Size([196]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for nn.transformer_layers.5.mhba.mha.pair_norm.bias: copying a param with shape torch.Size([196]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for nn.transformer_layers.5.mhba.scale_output.to_adaln_zero_gamma.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for nn.transformer_layers.5.transition.adaln.norm_cond.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.5.transition.adaln.norm_cond.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.5.transition.adaln.to_gamma.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for nn.transformer_layers.5.transition.adaln.to_beta.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for nn.transformer_layers.5.transition.scale_output.to_adaln_zero_gamma.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for nn.transformer_layers.6.mhba.adaln.norm_cond.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.6.mhba.adaln.norm_cond.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.6.mhba.adaln.to_gamma.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for nn.transformer_layers.6.mhba.adaln.to_beta.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for nn.transformer_layers.6.mhba.mha.to_qkv.weight: copying a param with shape torch.Size([1512, 512]) from checkpoint, the shape in current model is torch.Size([1536, 512]).
size mismatch for nn.transformer_layers.6.mhba.mha.to_qkv.bias: copying a param with shape torch.Size([1512]) from checkpoint, the shape in current model is torch.Size([1536]).
size mismatch for nn.transformer_layers.6.mhba.mha.to_g.weight: copying a param with shape torch.Size([504, 512]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for nn.transformer_layers.6.mhba.mha.to_g.bias: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.6.mhba.mha.to_out_node.weight: copying a param with shape torch.Size([512, 504]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for nn.transformer_layers.6.mhba.mha.q_layer_norm.weight: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.6.mhba.mha.q_layer_norm.bias: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.6.mhba.mha.k_layer_norm.weight: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.6.mhba.mha.k_layer_norm.bias: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.6.mhba.mha.to_bias.weight: copying a param with shape torch.Size([12, 196]) from checkpoint, the shape in current model is torch.Size([8, 256]).
size mismatch for nn.transformer_layers.6.mhba.mha.pair_norm.weight: copying a param with shape torch.Size([196]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for nn.transformer_layers.6.mhba.mha.pair_norm.bias: copying a param with shape torch.Size([196]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for nn.transformer_layers.6.mhba.scale_output.to_adaln_zero_gamma.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for nn.transformer_layers.6.transition.adaln.norm_cond.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.6.transition.adaln.norm_cond.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.6.transition.adaln.to_gamma.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for nn.transformer_layers.6.transition.adaln.to_beta.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for nn.transformer_layers.6.transition.scale_output.to_adaln_zero_gamma.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for nn.transformer_layers.7.mhba.adaln.norm_cond.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.7.mhba.adaln.norm_cond.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.7.mhba.adaln.to_gamma.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for nn.transformer_layers.7.mhba.adaln.to_beta.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for nn.transformer_layers.7.mhba.mha.to_qkv.weight: copying a param with shape torch.Size([1512, 512]) from checkpoint, the shape in current model is torch.Size([1536, 512]).
size mismatch for nn.transformer_layers.7.mhba.mha.to_qkv.bias: copying a param with shape torch.Size([1512]) from checkpoint, the shape in current model is torch.Size([1536]).
size mismatch for nn.transformer_layers.7.mhba.mha.to_g.weight: copying a param with shape torch.Size([504, 512]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for nn.transformer_layers.7.mhba.mha.to_g.bias: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.7.mhba.mha.to_out_node.weight: copying a param with shape torch.Size([512, 504]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for nn.transformer_layers.7.mhba.mha.q_layer_norm.weight: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.7.mhba.mha.q_layer_norm.bias: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.7.mhba.mha.k_layer_norm.weight: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.7.mhba.mha.k_layer_norm.bias: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.7.mhba.mha.to_bias.weight: copying a param with shape torch.Size([12, 196]) from checkpoint, the shape in current model is torch.Size([8, 256]).
size mismatch for nn.transformer_layers.7.mhba.mha.pair_norm.weight: copying a param with shape torch.Size([196]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for nn.transformer_layers.7.mhba.mha.pair_norm.bias: copying a param with shape torch.Size([196]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for nn.transformer_layers.7.mhba.scale_output.to_adaln_zero_gamma.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for nn.transformer_layers.7.transition.adaln.norm_cond.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.7.transition.adaln.norm_cond.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.7.transition.adaln.to_gamma.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for nn.transformer_layers.7.transition.adaln.to_beta.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for nn.transformer_layers.7.transition.scale_output.to_adaln_zero_gamma.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for nn.transformer_layers.8.mhba.adaln.norm_cond.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.8.mhba.adaln.norm_cond.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.8.mhba.adaln.to_gamma.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for nn.transformer_layers.8.mhba.adaln.to_beta.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for nn.transformer_layers.8.mhba.mha.to_qkv.weight: copying a param with shape torch.Size([1512, 512]) from checkpoint, the shape in current model is torch.Size([1536, 512]).
size mismatch for nn.transformer_layers.8.mhba.mha.to_qkv.bias: copying a param with shape torch.Size([1512]) from checkpoint, the shape in current model is torch.Size([1536]).
size mismatch for nn.transformer_layers.8.mhba.mha.to_g.weight: copying a param with shape torch.Size([504, 512]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for nn.transformer_layers.8.mhba.mha.to_g.bias: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.8.mhba.mha.to_out_node.weight: copying a param with shape torch.Size([512, 504]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for nn.transformer_layers.8.mhba.mha.q_layer_norm.weight: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.8.mhba.mha.q_layer_norm.bias: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.8.mhba.mha.k_layer_norm.weight: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.8.mhba.mha.k_layer_norm.bias: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.8.mhba.mha.to_bias.weight: copying a param with shape torch.Size([12, 196]) from checkpoint, the shape in current model is torch.Size([8, 256]).
size mismatch for nn.transformer_layers.8.mhba.mha.pair_norm.weight: copying a param with shape torch.Size([196]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for nn.transformer_layers.8.mhba.mha.pair_norm.bias: copying a param with shape torch.Size([196]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for nn.transformer_layers.8.mhba.scale_output.to_adaln_zero_gamma.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for nn.transformer_layers.8.transition.adaln.norm_cond.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.8.transition.adaln.norm_cond.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.8.transition.adaln.to_gamma.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for nn.transformer_layers.8.transition.adaln.to_beta.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for nn.transformer_layers.8.transition.scale_output.to_adaln_zero_gamma.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for nn.transformer_layers.9.mhba.adaln.norm_cond.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.9.mhba.adaln.norm_cond.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.9.mhba.adaln.to_gamma.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for nn.transformer_layers.9.mhba.adaln.to_beta.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for nn.transformer_layers.9.mhba.mha.to_qkv.weight: copying a param with shape torch.Size([1512, 512]) from checkpoint, the shape in current model is torch.Size([1536, 512]).
size mismatch for nn.transformer_layers.9.mhba.mha.to_qkv.bias: copying a param with shape torch.Size([1512]) from checkpoint, the shape in current model is torch.Size([1536]).
size mismatch for nn.transformer_layers.9.mhba.mha.to_g.weight: copying a param with shape torch.Size([504, 512]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for nn.transformer_layers.9.mhba.mha.to_g.bias: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.9.mhba.mha.to_out_node.weight: copying a param with shape torch.Size([512, 504]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for nn.transformer_layers.9.mhba.mha.q_layer_norm.weight: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.9.mhba.mha.q_layer_norm.bias: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.9.mhba.mha.k_layer_norm.weight: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.9.mhba.mha.k_layer_norm.bias: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.9.mhba.mha.to_bias.weight: copying a param with shape torch.Size([12, 196]) from checkpoint, the shape in current model is torch.Size([8, 256]).
size mismatch for nn.transformer_layers.9.mhba.mha.pair_norm.weight: copying a param with shape torch.Size([196]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for nn.transformer_layers.9.mhba.mha.pair_norm.bias: copying a param with shape torch.Size([196]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for nn.transformer_layers.9.mhba.scale_output.to_adaln_zero_gamma.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for nn.transformer_layers.9.transition.adaln.norm_cond.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.9.transition.adaln.norm_cond.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for nn.transformer_layers.9.transition.adaln.to_gamma.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for nn.transformer_layers.9.transition.adaln.to_beta.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for nn.transformer_layers.9.transition.scale_output.to_adaln_zero_gamma.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels