Skip to content

Conversation

@marcbal77
Copy link
Member

@marcbal77 marcbal77 commented Dec 7, 2025

Summary

  • Adds post-load corrections mechanism to fix dataset-specific metadata issues
  • Fixes GSE110554 metadata for samples GSM2998097 and GSM2998106 which had cell_type on wrong row

Closes #87

Changes

  • New biolearn/corrections.py module with extensible correction registry (aiming for extensibility here)
  • Modified DataSource to apply corrections after parsing, before caching
  • Added corrections: fix_gse110554 to GSE110554 entry in library.yaml

@marcbal77 marcbal77 self-assigned this Dec 7, 2025
@marcbal77 marcbal77 added the bug Something isn't working label Dec 7, 2025
@marcbal77
Copy link
Member Author

*not sure if corrections: fix_gse110554 in yaml is necessary upon review, working locally, but I think it's best practice to keep.

Copy link
Member

@sarudak sarudak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@marcbal77 marcbal77 merged commit 7c1f0b0 into bio-learn:master Dec 9, 2025
1 check passed
@marcbal77 marcbal77 deleted the fix/87-gse110554-metadata branch December 9, 2025 18:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

GSE110554 metadata loads incorrectly

2 participants