Questions about inconsistencies between the paper and the released data

Thank you for integrating and opensource the Benckmark dataset. 
I noticed that there are some inconsistencies between statistics in the paper and the released data in `benchmarks/CodonBERT/data`. Here are the confusing parts: 
- For the MLOS flu vaccine data, you show 543 mRNA samples in Table 1 in the paper, but I only found 167 samples in the released data. 
- For SARS-Cov-2 vaccine degradation data, you show 2400 mRNA samples in Table 1 in the paper, but I only found 233 samples in the released data.

Could you kindly clarify them? 

BTW, I noticed that some of the datasets are very small. When using a 0.7/0.15/0.15 split on such a small dataset and computing metrics like correlation, the results are not reliable. It would be better that you use k-fold cross validation. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about inconsistencies between the paper and the released data #3

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Questions about inconsistencies between the paper and the released data #3

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions