Data Project: Scope experiment design

We want to test using a reward model trained on human preferences to filter web data for training LMs. There are various model architectures etc that we might want to try.

* 117M GPT2
* 1.5B GPT2
* 1.3B Neo
* 2.7B Neo