Skip to content

Handle corrupted HDF5 files gracefully in split_containers#114

Open
zcapjdb wants to merge 6 commits intoumami-hep:mainfrom
zcapjdb:fix/handle-corrupted-h5-files
Open

Handle corrupted HDF5 files gracefully in split_containers#114
zcapjdb wants to merge 6 commits intoumami-hep:mainfrom
zcapjdb:fix/handle-corrupted-h5-files

Conversation

@zcapjdb
Copy link
Contributor

@zcapjdb zcapjdb commented Jan 16, 2026

⚠️ Fully vibe coded PR ⚠️

When creating a virtual dataset from multiple h5 files, validate each file before creating symlinks. Corrupted files are logged with a warning and skipped rather than crashing the entire process. Only raises an error if all files are corrupted.

Conformity

@zcapjdb zcapjdb changed the title Handle corrupted HDF5 files gracefully in split_containers Draft: Handle corrupted HDF5 files gracefully in split_containers Jan 16, 2026
@afroch afroch added the enhancement New feature or request label Jan 16, 2026
@codecov
Copy link

codecov bot commented Jan 16, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 95.39%. Comparing base (7cd7ca1) to head (7670b7b).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #114      +/-   ##
==========================================
+ Coverage   94.91%   95.39%   +0.48%     
==========================================
  Files          23       23              
  Lines        2044     2062      +18     
==========================================
+ Hits         1940     1967      +27     
+ Misses        104       95       -9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

zcapjdb and others added 4 commits January 30, 2026 13:21
When creating a virtual dataset from multiple h5 files, validate each
file before creating symlinks. Corrupted files are logged with a warning
and skipped rather than crashing the entire process. Only raises an
error if all files are corrupted.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Use 'import logging as log' to match codebase convention
- Fix typo: "fiules" -> "files"

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Reorder imports to satisfy ruff linter
- Add changelog entry for corrupted HDF5 file handling

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Tests cover:
- Valid HDF5 files returning True
- Corrupt files returning False
- Non-existent files returning False
- Empty files returning False
- String path input

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@zcapjdb zcapjdb force-pushed the fix/handle-corrupted-h5-files branch from d549137 to 1559d73 Compare January 30, 2026 13:22
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@zcapjdb zcapjdb changed the title Draft: Handle corrupted HDF5 files gracefully in split_containers Handle corrupted HDF5 files gracefully in split_containers Jan 30, 2026
Tests cover:
- Single file path/string handling (yields directly)
- All corrupted files (raises RuntimeError)
- Mixed files (filters corrupted with warnings)
- All valid files (creates VDS without warnings)
- Symlink creation (only valid files get symlinked)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@WeiShengL
Copy link
Collaborator

Nice fix! I wonder if the better approach is to do this check at the prep stage so that we can have this for the resampling approach too. What do you think @afroch, @dkobylianskii?

@afroch
Copy link
Contributor

afroch commented Feb 4, 2026

Nice fix! I wonder if the better approach is to do this check at the prep stage so that we can have this for the resampling approach too. What do you think @afroch, @dkobylianskii?

Tbh. I think this might be a better addition for atlas-ftag-tools, because the full H5 reader/writer is implemented there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants