Hi @csuhan,
I am trying to reproduce the results presented in Table 4 of the OneLLM paper (CVPR 2024). While I was able to reproduce the results on the MUSIC-AVQA dataset, I am struggling to achieve the same on VALOR32K and AVSD.
I would like to ask if you could provide the COCO caption annotation files for VALOR32K and AVSD, as referenced in eval/caption_eval.py (https://github.com/csuhan/OneLLM/blob/main/eval/caption_eval.py):
VALOR32K
annotation_file = 'datasets/Eval/video/valor32k/test_ann_cococap.json'
AVSD
annotation_file = 'datasets/Eval/video/AVSD/test_set4DSTC7-AVSD_cococap.json'
Additionally, if possible, could you share the script to generate the COCO caption annotation files from the test set JSON files of these datasets?
Thank you for your assistance!