Reproducibility issue

Thanks for making the code public!

I was trying to replicate your results (specifically the AUROC results for ProtGPS) using random split data as presented in supplementary table S1 first column. 

I'm using the notebook `Analysis.ipynb` using the model checkpoint shared on the zenodo (` 32bf44b16a4e770a674896b81dfb3729epoch=26.ckpt
`). However, in the AUROC section, I encounter the usage of this new dataset (`new_condensate_dataset_m3_c5_mmseqs `) which is not shared either on the github repository or at the zenodo record. This makes it impossible to replicate these scores. Therefore, I would request the authors to kindly share these. 

Secondly, when I use the `dataset.json` (shared on the github), I get the following scores:
```
nuclear_speckle:	0.426
p-body:	0.718
pml-bdoy:	0.596
post_synaptic_density:	0.365
stress_granule:	0.441
chromosome:	0.333
nucleolus:	0.799
nuclear_pore_complex:	0.712
cajal_body:	0.878
rna_granule:	0.223
cell_junction:	0.43
transcriptional:	0.726
``` 
Some of these are quite low (worse than random). 

Finally, I also trained a model (purely for reproducibility) using the dispatcher script on a A100 GPU with 80 Gigs of RAM and got a test auroc score of ~0.8. If I use this model for computing AUROC scores for each compartment using the same analysis notebook code, I get the following:
```
nuclear_speckle:	0.349
p-body:	0.618
pml-bdoy:	0.544
post_synaptic_density:	0.255
stress_granule:	0.6
chromosome:	0.518
nucleolus:	0.628
nuclear_pore_complex:	0.534
cajal_body:	0.845
rna_granule:	0.337
cell_junction:	0.459
transcriptional:	0.801
```
Now, I did not change anything in the config file but I guess there is some randomness coming from splits and model initializations but still it feels quite a bit of difference. 

Could the authors please comment on this? Am I doing something wrong here? I'm happy to provide more information (logs, configs, ..etc) if needed. 

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproducibility issue #18

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Reproducibility issue #18

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions