-
Notifications
You must be signed in to change notification settings - Fork 56
Open
Description
When I try to load sliced model lama3.1-70b-Instruct I got the following error.
lib/python3.10/site-packages/torch/nn/modules/module.py", line 2581, in load_state_dict
raise RuntimeError(
RuntimeError: Error(s) in loading state_dict for UninitializedLlamaForCausalLM:
Unexpected key(s) in state_dict: "model.layers.32.mlp_shortcut_Q", "model.layers.32.attn_shortcut_Q", "model.layers.32.self_attn.q_proj.weight", "model.layers.32.self_attn.k_proj.weight", "model.layers.32.self_attn.v_proj.weight", "model.layers.32.self_attn.o_proj.weight", "model.layers.32.mlp.gate_proj.weight", "model.layers.32.mlp.up_proj.weight", "model.layers.32.mlp.down_proj.weight", "model.layers.33.mlp_shortcut_Q", "model.layers.33.attn_shortcut_Q", "model.layers.33.self_attn.q_proj.weight", "model.layers.33.self_attn.k_proj.weight", "model.layers.33.self_attn.v_proj.weight", "model.layers.33.self_attn.o_proj.weight", "model.layers.33.mlp.gate_proj.weight", "model.layers.33.mlp.up_proj.weight", "model.layers.33.mlp.down_proj.weight", "model.layers.34.mlp_shortcut_Q", "model.layers.34.attn_shortcut_Q", "model.layers.34.self_attn.q_proj.weight", "model.layers.34.self_attn.k_proj.weight", "model.layers.34.self_attn.v_proj.weight",
and so on and later
size mismatch for model.embed_tokens.weight: copying a param with shape torch.Size([128256, 6144]) from checkpoint, the shape in current model is torch.Size([32000, 6144]).
size mismatch for model.layers.0.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 6144]) from checkpoint, the shape in current model is torch.Size([8192, 6144]).
size mismatch for model.layers.0.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 6144]) from checkpoint, the shape in current model is torch.Size([8192, 6144]).
size mismatch for model.layers.0.mlp.gate_proj.weight: copying a param with shape torch.Size([28672, 6144]) from checkpoint, the shape in current model is torch.Size([11008, 6144]).
size mismatch for model.layers.0.mlp.up_proj.weight: copying a param with shape torch.Size([28672, 6144]) from checkpoint, the shape in current model is torch.Size([11008, 6144]).
size mismatch for model.layers.0.mlp.down_proj.weight: copying a param with shape torch.Size([6144, 28672]) from checkpoint, the shape in current model is torch.Size([6144, 11008]).
...
ize mismatch for model.layers.31.mlp.gate_proj.weight: copying a param with shape torch.Size([28672, 6144]) from checkpoint, the shape in current model is torch.Size([11008, 6144]).
size mismatch for model.layers.31.mlp.up_proj.weight: copying a param with shape torch.Size([28672, 6144]) from checkpoint, the shape in current model is torch.Size([11008, 6144]).
size mismatch for model.layers.31.mlp.down_proj.weight: copying a param with shape torch.Size([6144, 28672]) from checkpoint, the shape in current model is torch.Size([6144, 11008]).
size mismatch for lm_head.weight: copying a param with shape torch.Size([128256, 8192]) from checkpoint, the shape in current model is torch.Size([32000, 8192]).
I sliced model by this way
python run_slicegpt.py \
--model meta-llama/Llama-3.1-70B-Instruct \
--save-dir results \
--sparsity 0.25 \
--device cuda \
--eval-baseline \
--distribute-model \
--no-wandb
And now I try to load_sliced_model.
How to fix this?
Metadata
Metadata
Assignees
Labels
No labels