doubts on reported number of Train/Val/Test Samples 

From the Figure 1 in the publication[1], the model was trained on a train/val/test split of `4396/ 472/1257. `

However, the MedFuse preprocessing pipeline[2], which has been reused for this paper with no major modification, reports `4885/540/1373`. Crucially, this is because the authors made a small mistake that is still existent in MeTra.  

Here is the error in Metra https://github.com/FirasGit/MeTra/blob/3947e611e86fa7147d0d34d080a3dfc3c6c5bb22/classification/datasets/mimic_lab.py#L163
```python
    if cfg.dataset.task == 'in-hospital mortality': # should be 'in-hospital-mortality'
           end_time = cxr_merged_icustays.intime + pd.DateOffset(hours=48)
```

which will include cxr samples > 48h after admission. After fixing this error, the MedFuse authors report a train/val/test split of `4485/488/1242` which is somewhat more inline with the reported split in MeTra. I could not find any preprocessing steps that would otherwise explain this gap.

Thus the question: What split correspond to the performance reports[3] in the publication? I ask this because the code does not match the reported train/val/test split

[1] https://www.nature.com/articles/s41598-023-37835-1/figures/1
[2] https://github.com/nyuad-cai/MedFuse/tree/6f827589afd89562813cc5aa915762d054c29efc
[3] https://www.nature.com/articles/s41598-023-37835-1/figures/2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

doubts on reported number of Train/Val/Test Samples #5

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

doubts on reported number of Train/Val/Test Samples #5

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions