Conversation
Keeps track of filepath in uns from which the h5sc object was loaded. This allows you to update values (e.g. obs) on disk.
…ensions from shape file
Add function to mask tissue regions
doesn't yet cover all functions
|
scPortrait_demo_annotate_regions.ipynb current tutorial covering the newly implemented functions |
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Pull request overview
This PR adds functionality for annotating and subsetting regions in scPortrait, along with improvements to h5sc file handling and testing.
Changes:
- Added functions to mask/crop images using spatial regions and subset h5sc objects by cell IDs
- Enhanced h5sc operations to support updating obs data on disk and adding spatial coordinates
- Improved plotting support for non-memory-backed h5sc files and fixed spatial coordinate handling
- Expanded test coverage for subset operations and h5sc plotting functionality
Reviewed changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/unit_tests/tools/sdata/test_subset.py | New comprehensive tests for spatial region masking and bounding box subsetting |
| tests/unit_tests/tools/h5sc/test_operations.py | New tests for h5sc subsetting and disk operations |
| tests/unit_tests/plotting/test_h5sc.py | Refactored to use real h5sc fixtures instead of mocks |
| tests/conftest.py | Added spatial data fixtures for testing region selection |
| src/scportrait/tools/sdata/processing/_subset.py | Implemented mask_region function for spatial masking/cropping |
| src/scportrait/tools/sdata/processing/init.py | Exported new mask_region function |
| src/scportrait/tools/h5sc/operations.py | Added functions for updating obs on disk, spatial coordinates, and cell subsetting |
| src/scportrait/tools/h5sc/init.py | Exported new h5sc operation functions |
| src/scportrait/plotting/sdata.py | Fixed dimension calculation for GeoDataFrame shapes |
| src/scportrait/plotting/h5sc.py | Added support for non-memory-backed h5sc plotting |
| src/scportrait/pipeline/project.py | Improved h5sc handle management to prevent file handle leaks |
| src/scportrait/pipeline/mask_filtering/region_masking.py | New module for image masking with automatic region detection |
| src/scportrait/pipeline/_utils/spatialdata_helper.py | Added debug print statement for memory calculation |
| src/scportrait/pipeline/_utils/constants.py | Added DEFAULT_IDENTIFIER_FILENAME constant |
| src/scportrait/io/h5sc.py | Updated to store file handle and source path in uns |
| requirements/requirements.txt | Added affine and rasterio dependencies |
| src/scportrait/tools/sdata/write/_write.py | Fixed channel name comparison logic |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| elif isinstance(image, xr.DataArray): | ||
| image = image | ||
|
|
||
| print(image.dtype) |
There was a problem hiding this comment.
Debug print statement should be removed before merging to production. If this information is needed for diagnostics, consider using proper logging instead.
| transform = get_transformation(mask, coordinate_system) | ||
|
|
||
| if check_memory(mask): | ||
| print("Array fits in memory, using in-memory calculation.") |
There was a problem hiding this comment.
Debug print statement should be removed or replaced with proper logging. For production code, use the logging module to allow users to control output verbosity.
| else: | ||
| # non backed h5sc adata objects can be accessed directly | ||
| # these are created by slicing original h5sc objects | ||
| col = "scportrait_cell_id" |
There was a problem hiding this comment.
The hardcoded column name 'scportrait_cell_id' is duplicated here and in lines 393-394. Consider importing and using DEFAULT_CELL_ID_NAME constant from scportrait.pipeline._utils.constants to maintain consistency.
| col = "scportrait_cell_id" | |
| col = DEFAULT_CELL_ID_NAME |
| If `to_disk` is False, returns the subsetted AnnData object. If `to_disk` is True, saves the subsetted AnnData object to disk and returns None. | ||
| """ | ||
| if outpath is not None: | ||
| if not isinstance(outpath, (str | Path)): |
There was a problem hiding this comment.
The isinstance check should use a tuple syntax for multiple types: isinstance(outpath, (str, Path)) instead of (str | Path). The union operator syntax is only valid in type hints, not in runtime isinstance checks.
| if not isinstance(outpath, (str | Path)): | |
| if not isinstance(outpath, (str, Path)): |
| # Load a fresh AnnData | ||
| adata = read_h5sc(self.extraction_f.output_path or self.extraction_f.extraction_file) |
There was a problem hiding this comment.
This will fail if extraction_f is None (as checked in the old code). The validation should still be performed before attempting to read the h5sc file, otherwise this will raise an AttributeError.
| # Load a fresh AnnData | |
| adata = read_h5sc(self.extraction_f.output_path or self.extraction_f.extraction_file) | |
| # Ensure extraction configuration is available before accessing its attributes | |
| extraction_f = getattr(self, "extraction_f", None) | |
| if extraction_f is None: | |
| raise ValueError( | |
| "Extraction configuration 'extraction_f' is not set. " | |
| "Run the extraction step before accessing 'h5sc'." | |
| ) | |
| # Load a fresh AnnData | |
| adata = read_h5sc(extraction_f.output_path or extraction_f.extraction_file) |
| ) | ||
| channel_names_old = image.coords["c"].values.tolist() | ||
| if channel_names_old != channel_names: | ||
| if any(channel_names_old != channel_names): |
There was a problem hiding this comment.
The comparison channel_names_old != channel_names produces a boolean array when comparing lists/arrays element-wise. The correct check should be channel_names_old != channel_names (direct list comparison) or not np.array_equal(channel_names_old, channel_names) if they are numpy arrays. Using any() on a scalar boolean will fail if they are plain lists.
| if any(channel_names_old != channel_names): | |
| if channel_names_old != channel_names: |
Aim of this PR is to add a tutorial that shows how you can annotate regions in scPortrait. It will implement additonal functions required to streamline this functionality.
New Features:
Bug Fixes: