Pipeline Parallelism is All You Need for Optimized Early-Exit Based Self-Speculative Decoding

This repository is the official implementation of PPSD.

Requirements

To install requirements:

pip install -r requirements.txt

CutModel

You can use the following command to divide the original LLM into several parts that fit to early exit training and pipeline parallel execution. Update --model_name with the actual path to model weights and --num_ee_block with your granurity.

python cut_model.py --model_path "/your/model/path" --num_ee_block 4

Training

You can use the following command to train Vicuna-7B. Update --model_name_or_path with the actual path to model weights ,--data_path with the actual path to data, --heaclassto choose which class of head to train and --num_ee_block with your granurity.

torchrun --nproc_per_node=4 --master_port=20001 train_mem.py \
    --model_name_or_path /your/model/path  \
    --data_path /your/data/path \
    --split_model_path /your/split/model/path \
    --bf16 True \
    --output_dir /output/path \
    --num_train_epochs 2 \
    --per_device_train_batch_size 2 \
    --gradient_accumulation_steps 1 \
    --save_strategy "steps" \
    --save_steps 2000 \
    --save_total_limit 1 \
    --learning_rate 5e-4 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --tf32 True \
    --model_max_length 2048 \
    --gradient_checkpointing True \
    --lazy_preprocess True \
    --num_ee_block 4 \
    --headclass "trm" \

Inference

To Inference my model, run:

You can use the following command to inference Models. Update --model_path with the actual path to model weights, --split_model_path with the actual path to split model weights ,--data_path with the actual path to data，--ckpt_path with the actual path to early exit head weights, --nproc_per_node with your granularity--stage with choose early exit point， --maxlen with max outputs length and --headclass to choose your head class.

torchrun --nproc_per_node=4 \
         --master_port=29989 \
     /PPSD/ee_vicuna_test_eval_any2.py \
     --model_path "/gemini/space/models/vicuna-v1.5-7b" \
     --split_model_path "/gemini/space/models/split-vicuna-v1.5-7b/" \
     --data_path "/gemini/space/datasets/data.json" \
     --ckpt_path "/gemini/space/ckpt/distill/ALL_vicuna_7b_ee_layers_lr5e-4_epoch2_logits+top1_trmhead/" \
     --headclass "trm" \
     --stage 0 \
     --maxlen 512

Evaluation

To evaluate my model on ImageNet, run:

python eval.py --model-file mymodel.pth --benchmark imagenet

Results

Our model achieves the following performance on Xsum,Gsm8k and Humaneval.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
fastchat		fastchat
README.md		README.md
cut_model.py		cut_model.py
cut_model.sh		cut_model.sh
ee_vicuna_test_eval_any2.py		ee_vicuna_test_eval_any2.py
requirements.txt		requirements.txt
test_inference.sh		test_inference.sh
train.py		train.py
train.sh		train.sh
train_mem.py		train_mem.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pipeline Parallelism is All You Need for Optimized Early-Exit Based Self-Speculative Decoding

Requirements

CutModel

Training

Inference

Evaluation

Results

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

LyliAgave/PPSD

Folders and files

Latest commit

History

Repository files navigation

Pipeline Parallelism is All You Need for Optimized Early-Exit Based Self-Speculative Decoding

Requirements

CutModel

Training

Inference

Evaluation

Results

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages