Large language models learn a lot of very useful information about the world, we are experimenting with human preferences to steer the models
The task of summarisation acts as a good testing ground for comparing methods as OpenAI have set some baselines and released their datasets here.
Download comparisons from OpenAI azcopy copy "https://openaipublic.blob.core.windows.net/summarize-from-feedback/dataset/*" . --recursive.
(not currently working)
https://openaipublic.blob.core.windows.net/summarize-from-feedback/datasets/tldr_3_filtered and https://openaipublic.blob.core.windows.net/summarize-from-feedback/datasets/tldr_3_filtered_queries also host OpenAI's filtered verson of the dataset TL;DR dataset by Syed, Shahbaz, Voelske, Michael, Potthast, Martin, & Stein, Benno (2018). It is licensed under CC BY 4.0.