Skip to content

Issue: stamp train Fails Due to Missing .h5 Extension in Feature File Paths #72

@LenisLin

Description

@LenisLin

Issue: stamp train Fails Due to Missing .h5 Extension in Feature File Paths

Problem Summary

When running the stamp train command, users may encounter the following errors:

  1. Warning: "some feature files could not be found" for paths like /mnt/NAS_21T/.../chief_ctranspath-5e630f4e/W20.
  2. Error: ValueError: With n_samples=0, test_size=0.25 and train_size=None, the resulting train set will be empty from train_test_split.

These issues occur because the code constructs feature file paths without the .h5 extension (e.g., /mnt/NAS_21T/.../W20), while the actual files include it (e.g., /mnt/NAS_21T/.../W20.h5). This mismatch causes the program to fail to locate the files, resulting in an empty dataset and a crashed train-test split.

Root Cause

The problem originates in the slide_to_patient_from_slide_table_ function:

  • It builds feature file paths by combining feature_dir with slide filenames from the slide table (e.g., W20).
  • It does not append the .h5 extension, leading to invalid paths and the subsequent errors.

Solution

Modify the slide_to_patient_from_slide_table_ function to append the .h5 extension to slide filenames when constructing paths.

Steps to Fix

  1. Locate the slide_to_patient_from_slide_table_ function (likely in stamp/modeling/train.py or similar).
  2. Update the feature path construction:
    • Original:
      FeaturePath(feature_dir / cast(str, k))
    • Fixed:
      FeaturePath(feature_dir / (cast(str, k) + '.h5'))
  3. Save and rerun stamp train.

This ensures the correct paths (e.g., /mnt/NAS_21T/.../W20.h5) are used, allowing the feature files to be found and loaded.

Additional Notes

  • This fix assumes the slide table lists filenames without extensions (e.g., W20). If your table includes .h5, adjust the logic to avoid duplication.
  • Alternatively, update the slide table’s filename_label column to include .h5 if you cannot modify the code.
  • Confirm all .h5 files exist in feature_dir to prevent further "file not found" warnings.

This should resolve the issue and help others avoid the same problem!

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions