Arxiv10 dataset split and code issues

Hi Ashkan,

1) I'm trying to reproduce the Arxiv10 test results for my learning but the dataset shared on your github page does not specify the train(80,000), validation(10,000), and test split(5,000). The example in the dataloader.py code was for IMDB.csv. 

2) Also, there is some issues with the code base. In trainer.py, df_embeddings was not defined anywhere at all. Plus, I can't seem to locate the multi-objective self-learning part that uses the similarities of embedding prototypes to train the Protoformer FW after fine tuning. Could you please point me in the right direction?

![image](https://github.com/user-attachments/assets/bb2fdcc3-150c-443e-a3ad-9f7fac10b9f5)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Arxiv10 dataset split and code issues #1

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Arxiv10 dataset split and code issues #1

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions