This repository contains the data for paper When Do Pre-Training Biases Propagate to Downstream Tasks? A Case Study in Text Summarization
All the data is contained is uploaded to a gogole drive folder.
sample_data.pkcontains the perturbed first paragraph of Wikipedia biographies that were used for the experiments in the paper.all_summaries.pkcontains the generated summaries for the Wikipedia biographies using all the models that we experimented with.data_for_plot.pkcontains the data needed to generate the plots in our paper.
- To load the data, please use the
load_data.ipynbnotebook. It walks through how to load the data as well as how to compute hallucination rates and create the heatmap in the paper. - If you'd like to generate the plots, please follow the
plot.ipynbnotebook.