Patching Pythia Models to use PoPE instead of RoPE

PoPE (Polar Coordinate Positional Embeddings) is an alternative method for positional embeddings in transformer models. It promises better generalization accross sequence length and to some degree better performance in general. We investigate the effect of switching from RoPE to PoPE for pretrained Pythia models. After patching, we recalibrate for ~2% of pretrain budget (inspired by DroPE).

Results

We show that PoPE can match RoPE performance after recalibration while providing better length generalization. After 6.5B tokens of continued training:

Model	Method	PPL @512	PPL @1024	PPL @2048	PPL @3072	PPL @4096
Pythia-70m	RoPE (original)	28.4	26.1	24.8	25.0	104.6
Pythia-70m	PoPE (recalibrated)	27.0	25.1	24.1	23.8	24.2
Pythia-160m	RoPE (original)	17.6	16.0	15.1	26.8	1101.9
Pythia-160m	PoPE (recalibrated)	17.7	16.3	15.5	15.1	15.4

Key observations:

PoPE achieves similar perplexity than RoPE at the training sequence length (2048)
PoPE maintains stable perplexity at longer sequence lengths (3072, 4096), while RoPE degrades significantly

Further insights/notes:

Replacing RoPE with PoPE seems more effective than dropping RoPE (see related DroPE Pythia replication)
With the current setup, continued pretrain without patching unexpectedly slightly improves loss/perplexity for high context lengths, potentially due to different weight decay or quantization setup

Training Setup

python pope_pythia.py

We train Pythia models further on their pretraining data The Pile using AdamW. Before training, we pre-tokenize the text data and cache it. Before and after training, we evaluate the perplexity on different sequence lengths. The script supports multi-gpu training and logging to wandb.

Open Questions

Does this transfer to larger models and other model families?
How is the performance on benchmarks and in real world use?

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
plotting		plotting
.gitignore		.gitignore
README.md		README.md
pope_pythia.py		pope_pythia.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Patching Pythia Models to use PoPE instead of RoPE

Results

Training Setup

Open Questions

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Patching Pythia Models to use PoPE instead of RoPE

Results

Training Setup

Open Questions

About

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages