Issue: stamp train Fails Due to Missing .h5 Extension in Feature File Paths
Problem Summary
When running the stamp train command, users may encounter the following errors:
- Warning: "some feature files could not be found" for paths like
/mnt/NAS_21T/.../chief_ctranspath-5e630f4e/W20.
- Error:
ValueError: With n_samples=0, test_size=0.25 and train_size=None, the resulting train set will be empty from train_test_split.
These issues occur because the code constructs feature file paths without the .h5 extension (e.g., /mnt/NAS_21T/.../W20), while the actual files include it (e.g., /mnt/NAS_21T/.../W20.h5). This mismatch causes the program to fail to locate the files, resulting in an empty dataset and a crashed train-test split.
Root Cause
The problem originates in the slide_to_patient_from_slide_table_ function:
- It builds feature file paths by combining
feature_dir with slide filenames from the slide table (e.g., W20).
- It does not append the
.h5 extension, leading to invalid paths and the subsequent errors.
Solution
Modify the slide_to_patient_from_slide_table_ function to append the .h5 extension to slide filenames when constructing paths.
Steps to Fix
- Locate the
slide_to_patient_from_slide_table_ function (likely in stamp/modeling/train.py or similar).
- Update the feature path construction:
- Original:
FeaturePath(feature_dir / cast(str, k))
- Fixed:
FeaturePath(feature_dir / (cast(str, k) + '.h5'))
- Save and rerun
stamp train.
This ensures the correct paths (e.g., /mnt/NAS_21T/.../W20.h5) are used, allowing the feature files to be found and loaded.
Additional Notes
- This fix assumes the slide table lists filenames without extensions (e.g.,
W20). If your table includes .h5, adjust the logic to avoid duplication.
- Alternatively, update the slide table’s
filename_label column to include .h5 if you cannot modify the code.
- Confirm all
.h5 files exist in feature_dir to prevent further "file not found" warnings.
This should resolve the issue and help others avoid the same problem!
Issue:
stamp trainFails Due to Missing.h5Extension in Feature File PathsProblem Summary
When running the
stamp traincommand, users may encounter the following errors:/mnt/NAS_21T/.../chief_ctranspath-5e630f4e/W20.ValueError: With n_samples=0, test_size=0.25 and train_size=None, the resulting train set will be emptyfromtrain_test_split.These issues occur because the code constructs feature file paths without the
.h5extension (e.g.,/mnt/NAS_21T/.../W20), while the actual files include it (e.g.,/mnt/NAS_21T/.../W20.h5). This mismatch causes the program to fail to locate the files, resulting in an empty dataset and a crashed train-test split.Root Cause
The problem originates in the
slide_to_patient_from_slide_table_function:feature_dirwith slide filenames from the slide table (e.g.,W20)..h5extension, leading to invalid paths and the subsequent errors.Solution
Modify the
slide_to_patient_from_slide_table_function to append the.h5extension to slide filenames when constructing paths.Steps to Fix
slide_to_patient_from_slide_table_function (likely instamp/modeling/train.pyor similar).stamp train.This ensures the correct paths (e.g.,
/mnt/NAS_21T/.../W20.h5) are used, allowing the feature files to be found and loaded.Additional Notes
W20). If your table includes.h5, adjust the logic to avoid duplication.filename_labelcolumn to include.h5if you cannot modify the code..h5files exist infeature_dirto prevent further "file not found" warnings.This should resolve the issue and help others avoid the same problem!