Hello, thank you for contributing the code.
We encountered difficulties reproducing the performance of the CoMM (fine-tuned) as shown in Table 1 of your paper (humor↑=65.96±0.44, Average*↑=69.88). But our current baseline on the humor dataset can only reach 63.304 and our pretrained model weights are based on the best performance on the validation set, rather than the final epoch. The following is the command line we execute. We would greatly appreciate guidance on the fine-tuning configurations to help us align with these results.

