Skip to content

LyliAgave/PPSD

Repository files navigation

Pipeline Parallelism is All You Need for Optimized Early-Exit Based Self-Speculative Decoding

This repository is the official implementation of PPSD.

Requirements

To install requirements:

pip install -r requirements.txt

CutModel

  • You can use the following command to divide the original LLM into several parts that fit to early exit training and pipeline parallel execution. Update --model_name with the actual path to model weights and --num_ee_block with your granurity.
python cut_model.py --model_path "/your/model/path" --num_ee_block 4

Training

  • You can use the following command to train Vicuna-7B. Update --model_name_or_path with the actual path to model weights ,--data_path with the actual path to data, --heaclassto choose which class of head to train and --num_ee_block with your granurity.
torchrun --nproc_per_node=4 --master_port=20001 train_mem.py \
    --model_name_or_path /your/model/path  \
    --data_path /your/data/path \
    --split_model_path /your/split/model/path \
    --bf16 True \
    --output_dir /output/path \
    --num_train_epochs 2 \
    --per_device_train_batch_size 2 \
    --gradient_accumulation_steps 1 \
    --save_strategy "steps" \
    --save_steps 2000 \
    --save_total_limit 1 \
    --learning_rate 5e-4 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --tf32 True \
    --model_max_length 2048 \
    --gradient_checkpointing True \
    --lazy_preprocess True \
    --num_ee_block 4 \
    --headclass "trm" \

Inference

To Inference my model, run:

  • You can use the following command to inference Models. Update --model_path with the actual path to model weights, --split_model_path with the actual path to split model weights ,--data_path with the actual path to data,--ckpt_path with the actual path to early exit head weights, --nproc_per_node with your granularity--stage with choose early exit point, --maxlen with max outputs length and --headclass to choose your head class.
torchrun --nproc_per_node=4 \
         --master_port=29989 \
     /PPSD/ee_vicuna_test_eval_any2.py \
     --model_path "/gemini/space/models/vicuna-v1.5-7b" \
     --split_model_path "/gemini/space/models/split-vicuna-v1.5-7b/" \
     --data_path "/gemini/space/datasets/data.json" \
     --ckpt_path "/gemini/space/ckpt/distill/ALL_vicuna_7b_ee_layers_lr5e-4_epoch2_logits+top1_trmhead/" \
     --headclass "trm" \
     --stage 0 \
     --maxlen 512

Evaluation

To evaluate my model on ImageNet, run:

python eval.py --model-file mymodel.pth --benchmark imagenet

Results

Our model achieves the following performance on Xsum,Gsm8k and Humaneval.

About

The official implementation of PPSD.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors