The source code of the paper SALAS: Supervised Aspect Learning Improves Abstractive Multi-document Summarization Through Aspect Information Loss.
- Python (tested on 3.8.13)
- CUDA (tested on 11.4)
- PyTorch (tested on 1.8.0)
- Transformers (tested on 4.6.0)
- numpy (tested on 1.23.2)
- tqdm
The MRED dataset can be downloaded from https://github.com/Shen-Chenhui/MReD The WikiAsp dataset can be downloaded from https://github.com/neulab/wikiasp
First, place the data in raw_data. Next, processing the data by
python 1_process_data.pyThe processed files are stored in processed_data.
Here are some descriptions.
- doc: original document
- doc_with_sent_aspect: aspects with each sentence
- sent_controlled_doc:
[label1, label2, ... ], per-sentence label sequence for the meta-review wherelabel1represents the category label for 1st sentence,label2for the 2nd sentence and so on - seg_controlled_doc:
[label1, label2, ... ], label sequence for the meta-review on segment level wherelabel1represents the category label for 1st segment (the sentences of the same label),label2for the 2nd segment and so on - summary: Summary of the document
- summary_with_seg_aspect: segment level summaries
- summary_with_sent_aspect: Sentence level summaries
- sample_id:
yyyy-id, whereyyyyis the year
The training and evaluation are executed by 3_run_our_idea.py, you can replace the pretrained_model_path to use different pre-trained models.
Our implementation of the case study is in case_study.py.