Skip to content

LLamaTopDownForCausalLM.from_pretrained Error in with Pre-tuning on OpenWebText  #7

@sanyalsunny111

Description

@sanyalsunny111

While trying to pre-tune or finetuning I am getting an error in the model itself LLamaTopDownForCausalLM.from_pretrained for both the scenarios

This is the main error

File "/home/ss95332/src/pycharmprojects/TOAST/language_generation/models/llama_top_down.py", line 156, in init
self.mlp = LlamaMLP(
File "/home/ss95332/anaconda3/envs/toast/lib/python3.9/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 434, in wrapper
f(module, *args, **kwargs)
TypeError: init() got an unexpected keyword argument 'hidden_size'

Here is the full traceback

File "/home/ss95332/src/pycharmprojects/TOAST/language_generation/pretune_top_down.py", line 347, in
train()
File "/home/ss95332/src/pycharmprojects/TOAST/language_generation/pretune_top_down.py", line 256, in train
model = LlamaTopDownForCausalLM.from_pretrained(
File "/home/ss95332/anaconda3/envs/toast/lib/python3.9/site-packages/transformers/modeling_utils.py", line 2700, in from_pretrained
model = cls(config, *model_args, **model_kwargs)
File "/home/ss95332/anaconda3/envs/toast/lib/python3.9/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 434, in wrapper
f(module, *args, **kwargs)
File "/home/ss95332/src/pycharmprojects/TOAST/language_generation/models/llama_top_down.py", line 613, in init
self.model = LlamaModel(config)
File "/home/ss95332/anaconda3/envs/toast/lib/python3.9/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 434, in wrapper
f(module, *args, **kwargs)
File "/home/ss95332/src/pycharmprojects/TOAST/language_generation/models/llama_top_down.py", line 361, in init
self.layers = nn.ModuleList([LlamaDecoderLayer(config) for _ in range(config.num_hidden_layers)])
File "/home/ss95332/src/pycharmprojects/TOAST/language_generation/models/llama_top_down.py", line 361, in
self.layers = nn.ModuleList([LlamaDecoderLayer(config) for _ in range(config.num_hidden_layers)])
File "/home/ss95332/anaconda3/envs/toast/lib/python3.9/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 434, in wrapper
f(module, *args, **kwargs)
File "/home/ss95332/src/pycharmprojects/TOAST/language_generation/models/llama_top_down.py", line 156, in init
self.mlp = LlamaMLP(
File "/home/ss95332/anaconda3/envs/toast/lib/python3.9/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 434, in wrapper
f(module, *args, **kwargs)
TypeError: init() got an unexpected keyword argument 'hidden_size'
Traceback (most recent call last):

Here is my training code

python -m torch.distributed.run --nproc_per_node=4 --master_port=4455 pretune_top_down.py --model_name_or_path decapoda-research/llama-7b-hf --data_path ./alpaca_data.json --bf16 True --output_dir results --num_train_epochs 1 --per_device_train_batch_size 3 --per_device_eval_batch_size 3 --gradient_accumulation_steps 12 --evaluation_strategy "no" --save_strategy "steps" --save_steps 2000 --save_total_limit 1 --learning_rate 3e-5 --weight_decay 0. --warmup_ratio 0.03 --lr_scheduler_type "cosine" --logging_steps 1 --tf32 True --report_to "none" --deepspeed "./configs/default_offload_opt_param.json"

python -m torch.distributed.run --nproc_per_node=4 --master_port=5959 train.py --model_name_or_path decapoda-research/llama-7b-hf --data_path data/sharegpt_vicuna/ShareGPT_V3_unfiltered_cleaned_split.json --bf16 True --output_dir results --num_train_epochs 2 --per_device_train_batch_size 2 --per_device_eval_batch_size 2 --gradient_accumulation_steps 12 --evaluation_strategy "no" --save_strategy "steps" --save_steps 2000 --save_total_limit 1 --learning_rate 2e-5 --weight_decay 0. --warmup_ratio 0.03 --lr_scheduler_type "cosine" --logging_steps 1 --tf32 True --report_to "none" --deepspeed "./configs/default_offload_opt_param.json" --model "llama-lora"

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions