Mismatch error when trying to load trained model

Hi @zkkli, many thanks for your work, it is a quite nice contribution to the state-of-the-art.

After training a model a model by running: `python quant_train.py --model deit_tiny --data <YOUR_DATA_DIR> --epochs 30 --lr 5e-7 `, the checkpoint is saved in `results/checkpoint.pth.tar`. When trying to load the model and weights on it, I get the following error:

`RuntimeError: Error(s) in loading state_dict for VisionTransformer:
        size mismatch for qact_input.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the shape                                                                                 in current model is torch.Size([1]).
        size mismatch for patch_embed.qact.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the                                                                                 shape in current model is torch.Size([1]).
        size mismatch for qact_pos.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the shape in                                                                                 current model is torch.Size([1]).
        size mismatch for qact1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the shape in cu                                                                                rrent model is torch.Size([1]).
        size mismatch for blocks.0.norm1.norm_scaling_factor: copying a param with shape torch.Size([192]) from checkpoint, th                                                                                e shape in current model is torch.Size([1]).
        size mismatch for blocks.0.qact1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the sh                                                                                ape in current model is torch.Size([1]).
        size mismatch for blocks.0.attn.qact1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, t                                                                                he shape in current model is torch.Size([1]).
        size mismatch for blocks.0.attn.qact_attn1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoi                                                                                nt, the shape in current model is torch.Size([1]).
        size mismatch for blocks.0.attn.qact2.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, t                                                                                he shape in current model is torch.Size([1]).
        size mismatch for blocks.0.attn.qact3.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, t                                                                                he shape in current model is torch.Size([1]).
        size mismatch for blocks.0.attn.matmul_1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint                                                                                , the shape in current model is torch.Size([1]).
        size mismatch for blocks.0.qact2.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the sh                                                                                ape in current model is torch.Size([1]).
        size mismatch for blocks.0.norm2.norm_scaling_factor: copying a param with shape torch.Size([192]) from checkpoint, th                                                                                e shape in current model is torch.Size([1]).
        size mismatch for blocks.0.qact3.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the sh                                                                                ape in current model is torch.Size([1]).
        size mismatch for blocks.0.mlp.qact1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, th                                                                                e shape in current model is torch.Size([1]).
        size mismatch for blocks.0.mlp.qact2.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, th                                                                                e shape in current model is torch.Size([1]).
        size mismatch for blocks.0.mlp.qact_gelu.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint                                                                                , the shape in current model is torch.Size([1]).
        size mismatch for blocks.0.qact4.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the sh                                                                                ape in current model is torch.Size([1]).
        size mismatch for blocks.1.norm1.norm_scaling_factor: copying a param with shape torch.Size([192]) from checkpoint, th                                                                                e shape in current model is torch.Size([1]).
        size mismatch for blocks.1.qact1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the sh                                                                                ape in current model is torch.Size([1]).
        size mismatch for blocks.1.attn.qact1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, t                                                                                he shape in current model is torch.Size([1]).
        size mismatch for blocks.1.attn.qact_attn1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoi                                                                                nt, the shape in current model is torch.Size([1]).
        size mismatch for blocks.1.attn.qact2.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, t                                                                                he shape in current model is torch.Size([1]).
        size mismatch for blocks.1.attn.qact3.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, t                                                                                he shape in current model is torch.Size([1]).
        size mismatch for blocks.1.attn.matmul_1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint                                                                                , the shape in current model is torch.Size([1]).
        size mismatch for blocks.1.qact2.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the sh                                                                                ape in current model is torch.Size([1]).
        size mismatch for blocks.1.norm2.norm_scaling_factor: copying a param with shape torch.Size([192]) from checkpoint, th                                                                                e shape in current model is torch.Size([1]).
        size mismatch for blocks.1.qact3.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the sh                                                                                ape in current model is torch.Size([1]).
        size mismatch for blocks.1.mlp.qact1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, th                                                                                e shape in current model is torch.Size([1]).
        size mismatch for blocks.1.mlp.qact2.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, th                                                                                e shape in current model is torch.Size([1]).
        size mismatch for blocks.1.mlp.qact_gelu.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint                                                                                , the shape in current model is torch.Size([1]).
        size mismatch for blocks.1.qact4.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the sh                                                                                ape in current model is torch.Size([1]).
        size mismatch for blocks.2.norm1.norm_scaling_factor: copying a param with shape torch.Size([192]) from checkpoint, th                                                                                e shape in current model is torch.Size([1]).
        size mismatch for blocks.2.qact1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the sh                                                                                ape in current model is torch.Size([1]).
        size mismatch for blocks.2.attn.qact1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, t                                                                                he shape in current model is torch.Size([1]).
        size mismatch for blocks.2.attn.qact_attn1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoi                                                                                nt, the shape in current model is torch.Size([1]).
        size mismatch for blocks.2.attn.qact2.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, t                                                                                he shape in current model is torch.Size([1]).
        size mismatch for blocks.2.attn.qact3.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, t                                                                                he shape in current model is torch.Size([1]).
        size mismatch for blocks.2.attn.matmul_1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint                                                                                , the shape in current model is torch.Size([1]).
        size mismatch for blocks.2.qact2.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the sh                                                                                ape in current model is torch.Size([1]).
        size mismatch for blocks.2.norm2.norm_scaling_factor: copying a param with shape torch.Size([192]) from checkpoint, th                                                                                e shape in current model is torch.Size([1]).
        size mismatch for blocks.2.qact3.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the sh                                                                                ape in current model is torch.Size([1]).
        size mismatch for blocks.2.mlp.qact1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, th                                                                                e shape in current model is torch.Size([1]).
        size mismatch for blocks.2.mlp.qact2.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, th                                                                                e shape in current model is torch.Size([1]).
        size mismatch for blocks.2.mlp.qact_gelu.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint                                                                                , the shape in current model is torch.Size([1]).
        size mismatch for blocks.2.qact4.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the sh                                                                                ape in current model is torch.Size([1]).
        size mismatch for blocks.3.norm1.norm_scaling_factor: copying a param with shape torch.Size([192]) from checkpoint, th                                                                                e shape in current model is torch.Size([1]).
        size mismatch for blocks.3.qact1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the sh                                                                                ape in current model is torch.Size([1]).
        size mismatch for blocks.3.attn.qact1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, t                                                                                he shape in current model is torch.Size([1]).
        size mismatch for blocks.3.attn.qact_attn1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoi                                                                                nt, the shape in current model is torch.Size([1]).
        size mismatch for blocks.3.attn.qact2.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, t                                                                                he shape in current model is torch.Size([1]).
        size mismatch for blocks.3.attn.qact3.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, t                                                                                he shape in current model is torch.Size([1]).
        size mismatch for blocks.3.attn.matmul_1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint                                                                                , the shape in current model is torch.Size([1]).
        size mismatch for blocks.3.qact2.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the sh                                                                                ape in current model is torch.Size([1]).
        size mismatch for blocks.3.norm2.norm_scaling_factor: copying a param with shape torch.Size([192]) from checkpoint, th                                                                                e shape in current model is torch.Size([1]).
        size mismatch for blocks.3.qact3.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the sh                                                                                ape in current model is torch.Size([1]).
        size mismatch for blocks.3.mlp.qact1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, th                                                                                e shape in current model is torch.Size([1]).
        size mismatch for blocks.3.mlp.qact2.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, th                                                                                e shape in current model is torch.Size([1]).
        size mismatch for blocks.3.mlp.qact_gelu.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint                                                                                , the shape in current model is torch.Size([1]).
        size mismatch for blocks.3.qact4.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the sh                                                                                ape in current model is torch.Size([1]).
        size mismatch for blocks.4.norm1.norm_scaling_factor: copying a param with shape torch.Size([192]) from checkpoint, th                                                                                e shape in current model is torch.Size([1]).
        size mismatch for blocks.4.qact1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the sh                                                                                ape in current model is torch.Size([1]).
        size mismatch for blocks.4.attn.qact1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, t                                                                                he shape in current model is torch.Size([1]).
        size mismatch for blocks.4.attn.qact_attn1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoi                                                                                nt, the shape in current model is torch.Size([1]).
        size mismatch for blocks.4.attn.qact2.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, t                                                                                he shape in current model is torch.Size([1]).
        size mismatch for blocks.4.attn.qact3.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, t                                                                                he shape in current model is torch.Size([1]).
        size mismatch for blocks.4.attn.matmul_1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint                                                                                , the shape in current model is torch.Size([1]).
        size mismatch for blocks.4.qact2.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the sh                                                                                ape in current model is torch.Size([1]).
        size mismatch for blocks.4.norm2.norm_scaling_factor: copying a param with shape torch.Size([192]) from checkpoint, th                                                                                e shape in current model is torch.Size([1]).
        size mismatch for blocks.4.qact3.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the sh                                                                                ape in current model is torch.Size([1]).
        size mismatch for blocks.4.mlp.qact1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, th                                                                                e shape in current model is torch.Size([1]).
        size mismatch for blocks.4.mlp.qact2.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, th                                                                                e shape in current model is torch.Size([1]).
        size mismatch for blocks.4.mlp.qact_gelu.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint                                                                                , the shape in current model is torch.Size([1]).
        size mismatch for blocks.4.qact4.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the sh                                                                                ape in current model is torch.Size([1]).
        size mismatch for blocks.5.norm1.norm_scaling_factor: copying a param with shape torch.Size([192]) from checkpoint, th                                                                                e shape in current model is torch.Size([1]).
        size mismatch for blocks.5.qact1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the sh                                                                                ape in current model is torch.Size([1]).
        size mismatch for blocks.5.attn.qact1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, t                                                                                he shape in current model is torch.Size([1]).
        size mismatch for blocks.5.attn.qact_attn1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoi                                                                                nt, the shape in current model is torch.Size([1]).
        size mismatch for blocks.5.attn.qact2.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, t                                                                                he shape in current model is torch.Size([1]).
        size mismatch for blocks.5.attn.qact3.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, t                                                                                he shape in current model is torch.Size([1]).
        size mismatch for blocks.5.attn.matmul_1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint                                                                                , the shape in current model is torch.Size([1]).
        size mismatch for blocks.5.qact2.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the sh                                                                                ape in current model is torch.Size([1]).
        size mismatch for blocks.5.norm2.norm_scaling_factor: copying a param with shape torch.Size([192]) from checkpoint, th                                                                                e shape in current model is torch.Size([1]).
        size mismatch for blocks.5.qact3.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the sh                                                                                ape in current model is torch.Size([1]).
        size mismatch for blocks.5.mlp.qact1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, th                                                                                e shape in current model is torch.Size([1]).
        size mismatch for blocks.5.mlp.qact2.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, th                                                                                e shape in current model is torch.Size([1]).
        size mismatch for blocks.5.mlp.qact_gelu.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint                                                                                , the shape in current model is torch.Size([1]).
        size mismatch for blocks.5.qact4.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the sh                                                                                ape in current model is torch.Size([1]).
        size mismatch for blocks.6.norm1.norm_scaling_factor: copying a param with shape torch.Size([192]) from checkpoint, th                                                                                e shape in current model is torch.Size([1]).
        size mismatch for blocks.6.qact1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the sh                                                                                ape in current model is torch.Size([1]).
        size mismatch for blocks.6.attn.qact1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, t                                                                                he shape in current model is torch.Size([1]).
        size mismatch for blocks.6.attn.qact_attn1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoi                                                                                nt, the shape in current model is torch.Size([1]).
        size mismatch for blocks.6.attn.qact2.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, t                                                                                he shape in current model is torch.Size([1]).
        size mismatch for blocks.6.attn.qact3.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, t                                                                                he shape in current model is torch.Size([1]).
        size mismatch for blocks.6.attn.matmul_1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint                                                                                , the shape in current model is torch.Size([1]).
        size mismatch for blocks.6.qact2.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the sh                                                                                ape in current model is torch.Size([1]).
        size mismatch for blocks.6.norm2.norm_scaling_factor: copying a param with shape torch.Size([192]) from checkpoint, th                                                                                e shape in current model is torch.Size([1]).
        size mismatch for blocks.6.qact3.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the sh                                                                                ape in current model is torch.Size([1]).
        size mismatch for blocks.6.mlp.qact1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, th                                                                                e shape in current model is torch.Size([1]).
        size mismatch for blocks.6.mlp.qact2.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, th                                                                                e shape in current model is torch.Size([1]).
        size mismatch for blocks.6.mlp.qact_gelu.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint                                                                                , the shape in current model is torch.Size([1]).
        size mismatch for blocks.6.qact4.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the sh                                                                                ape in current model is torch.Size([1]).
        size mismatch for blocks.7.norm1.norm_scaling_factor: copying a param with shape torch.Size([192]) from checkpoint, th                                                                                e shape in current model is torch.Size([1]).
        size mismatch for blocks.7.qact1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the sh                                                                                ape in current model is torch.Size([1]).
        size mismatch for blocks.7.attn.qact1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, t                                                                                he shape in current model is torch.Size([1]).
        size mismatch for blocks.7.attn.qact_attn1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoi                                                                                nt, the shape in current model is torch.Size([1]).
        size mismatch for blocks.7.attn.qact2.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, t                                                                                he shape in current model is torch.Size([1]).
        size mismatch for blocks.7.attn.qact3.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, t                                                                                he shape in current model is torch.Size([1]).
        size mismatch for blocks.7.attn.matmul_1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint                                                                                , the shape in current model is torch.Size([1]).
        size mismatch for blocks.7.qact2.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the sh                                                                                ape in current model is torch.Size([1]).
        size mismatch for blocks.7.norm2.norm_scaling_factor: copying a param with shape torch.Size([192]) from checkpoint, th                                                                                e shape in current model is torch.Size([1]).
        size mismatch for blocks.7.qact3.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the sh                                                                                ape in current model is torch.Size([1]).
        size mismatch for blocks.7.mlp.qact1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, th                                                                                e shape in current model is torch.Size([1]).
        size mismatch for blocks.7.mlp.qact2.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, th                                                                                e shape in current model is torch.Size([1]).
        size mismatch for blocks.7.mlp.qact_gelu.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint                                                                                , the shape in current model is torch.Size([1]).
        size mismatch for blocks.7.qact4.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the sh                                                                                ape in current model is torch.Size([1]).
        size mismatch for blocks.8.norm1.norm_scaling_factor: copying a param with shape torch.Size([192]) from checkpoint, th                                                                                e shape in current model is torch.Size([1]).
        size mismatch for blocks.8.qact1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the sh                                                                                ape in current model is torch.Size([1]).
        size mismatch for blocks.8.attn.qact1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, t                                                                                he shape in current model is torch.Size([1]).
        size mismatch for blocks.8.attn.qact_attn1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoi                                                                                nt, the shape in current model is torch.Size([1]).
        size mismatch for blocks.8.attn.qact2.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, t                                                                                he shape in current model is torch.Size([1]).
        size mismatch for blocks.8.attn.qact3.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, t                                                                                he shape in current model is torch.Size([1]).
        size mismatch for blocks.8.attn.matmul_1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint                                                                                , the shape in current model is torch.Size([1]).
        size mismatch for blocks.8.qact2.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the sh                                                                                ape in current model is torch.Size([1]).
        size mismatch for blocks.8.norm2.norm_scaling_factor: copying a param with shape torch.Size([192]) from checkpoint, th                                                                                e shape in current model is torch.Size([1]).
        size mismatch for blocks.8.qact3.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the sh                                                                                ape in current model is torch.Size([1]).
        size mismatch for blocks.8.mlp.qact1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, th                                                                                e shape in current model is torch.Size([1]).
        size mismatch for blocks.8.mlp.qact2.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, th                                                                                e shape in current model is torch.Size([1]).
        size mismatch for blocks.8.mlp.qact_gelu.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint                                                                                , the shape in current model is torch.Size([1]).
        size mismatch for blocks.8.qact4.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the sh                                                                                ape in current model is torch.Size([1]).
        size mismatch for blocks.9.norm1.norm_scaling_factor: copying a param with shape torch.Size([192]) from checkpoint, th                                                                                e shape in current model is torch.Size([1]).
        size mismatch for blocks.9.qact1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the sh                                                                                ape in current model is torch.Size([1]).
        size mismatch for blocks.9.attn.qact1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, t                                                                                he shape in current model is torch.Size([1]).
        size mismatch for blocks.9.attn.qact_attn1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoi                                                                                nt, the shape in current model is torch.Size([1]).
        size mismatch for blocks.9.attn.qact2.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, t                                                                                he shape in current model is torch.Size([1]).
        size mismatch for blocks.9.attn.qact3.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, t                                                                                he shape in current model is torch.Size([1]).
        size mismatch for blocks.9.attn.matmul_1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint                                                                                , the shape in current model is torch.Size([1]).
        size mismatch for blocks.9.qact2.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the sh                                                                                ape in current model is torch.Size([1]).
        size mismatch for blocks.9.norm2.norm_scaling_factor: copying a param with shape torch.Size([192]) from checkpoint, th                                                                                e shape in current model is torch.Size([1]).
        size mismatch for blocks.9.qact3.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the sh                                                                                ape in current model is torch.Size([1]).
        size mismatch for blocks.9.mlp.qact1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, th                                                                                e shape in current model is torch.Size([1]).
        size mismatch for blocks.9.mlp.qact2.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, th                                                                                e shape in current model is torch.Size([1]).
        size mismatch for blocks.9.mlp.qact_gelu.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint                                                                                , the shape in current model is torch.Size([1]).
        size mismatch for blocks.9.qact4.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the sh                                                                                ape in current model is torch.Size([1]).
        size mismatch for blocks.10.norm1.norm_scaling_factor: copying a param with shape torch.Size([192]) from checkpoint, t                                                                                he shape in current model is torch.Size([1]).
        size mismatch for blocks.10.qact1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the s                                                                                hape in current model is torch.Size([1]).
        size mismatch for blocks.10.attn.qact1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint,                                                                                 the shape in current model is torch.Size([1]).
        size mismatch for blocks.10.attn.qact_attn1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpo                                                                                int, the shape in current model is torch.Size([1]).
        size mismatch for blocks.10.attn.qact2.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint,                                                                                 the shape in current model is torch.Size([1]).
        size mismatch for blocks.10.attn.qact3.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint,                                                                                 the shape in current model is torch.Size([1]).
        size mismatch for blocks.10.attn.matmul_1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoin                                                                                t, the shape in current model is torch.Size([1]).
        size mismatch for blocks.10.qact2.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the s                                                                                hape in current model is torch.Size([1]).
        size mismatch for blocks.10.norm2.norm_scaling_factor: copying a param with shape torch.Size([192]) from checkpoint, t                                                                                he shape in current model is torch.Size([1]).
        size mismatch for blocks.10.qact3.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the s                                                                                hape in current model is torch.Size([1]).
        size mismatch for blocks.10.mlp.qact1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, t                                                                                he shape in current model is torch.Size([1]).
        size mismatch for blocks.10.mlp.qact2.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, t                                                                                he shape in current model is torch.Size([1]).
        size mismatch for blocks.10.mlp.qact_gelu.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoin                                                                                t, the shape in current model is torch.Size([1]).
        size mismatch for blocks.10.qact4.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the s                                                                                hape in current model is torch.Size([1]).
        size mismatch for blocks.11.norm1.norm_scaling_factor: copying a param with shape torch.Size([192]) from checkpoint, t                                                                                he shape in current model is torch.Size([1]).
        size mismatch for blocks.11.qact1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the s                                                                                hape in current model is torch.Size([1]).
        size mismatch for blocks.11.attn.qact1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint,                                                                                 the shape in current model is torch.Size([1]).
        size mismatch for blocks.11.attn.qact_attn1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpo                                                                                int, the shape in current model is torch.Size([1]).
        size mismatch for blocks.11.attn.qact2.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint,                                                                                 the shape in current model is torch.Size([1]).
        size mismatch for blocks.11.attn.qact3.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint,                                                                                 the shape in current model is torch.Size([1]).
        size mismatch for blocks.11.attn.matmul_1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoin                                                                                t, the shape in current model is torch.Size([1]).
        size mismatch for blocks.11.qact2.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the s                                                                                hape in current model is torch.Size([1]).
        size mismatch for blocks.11.norm2.norm_scaling_factor: copying a param with shape torch.Size([192]) from checkpoint, t                                                                                he shape in current model is torch.Size([1]).
        size mismatch for blocks.11.qact3.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the s                                                                                hape in current model is torch.Size([1]).
        size mismatch for blocks.11.mlp.qact1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, t                                                                                he shape in current model is torch.Size([1]).
        size mismatch for blocks.11.mlp.qact2.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, t                                                                                he shape in current model is torch.Size([1]).
        size mismatch for blocks.11.mlp.qact_gelu.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoin                                                                                t, the shape in current model is torch.Size([1]).
        size mismatch for blocks.11.qact4.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the s                                                                                hape in current model is torch.Size([1]).
        size mismatch for norm.norm_scaling_factor: copying a param with shape torch.Size([192]) from checkpoint, the shape in                                                                                 current model is torch.Size([1]).
        size mismatch for qact2.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the shape in cu                                                                                rrent model is torch.Size([1]).`

Any thoughts on why this is happening?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mismatch error when trying to load trained model #7

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Mismatch error when trying to load trained model #7

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions