Skip to content

Add tutorial roi regions#330

Open
sophiamaedler wants to merge 29 commits intomainfrom
add_tutorial_roi_regions
Open

Add tutorial roi regions#330
sophiamaedler wants to merge 29 commits intomainfrom
add_tutorial_roi_regions

Conversation

@sophiamaedler
Copy link
Collaborator

@sophiamaedler sophiamaedler commented Nov 6, 2025

Aim of this PR is to add a tutorial that shows how you can annotate regions in scPortrait. It will implement additonal functions required to streamline this functionality.

New Features:

  • update adata.obs to disk
  • add plotting support for non memory backed h5sc files
  • add functions to subset h5sc objects according to cell_id (a new h5sc is written to disk)
  • improve h5sc tests

Bug Fixes:

  • correctly read dimensions to generate a new axes object when plotting a spatialdata shapes layer

@sophiamaedler
Copy link
Collaborator Author

scPortrait_demo_annotate_regions.ipynb

current tutorial covering the newly implemented functions

Copilot AI review requested due to automatic review settings February 5, 2026 10:38
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds functionality for annotating and subsetting regions in scPortrait, along with improvements to h5sc file handling and testing.

Changes:

  • Added functions to mask/crop images using spatial regions and subset h5sc objects by cell IDs
  • Enhanced h5sc operations to support updating obs data on disk and adding spatial coordinates
  • Improved plotting support for non-memory-backed h5sc files and fixed spatial coordinate handling
  • Expanded test coverage for subset operations and h5sc plotting functionality

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
tests/unit_tests/tools/sdata/test_subset.py New comprehensive tests for spatial region masking and bounding box subsetting
tests/unit_tests/tools/h5sc/test_operations.py New tests for h5sc subsetting and disk operations
tests/unit_tests/plotting/test_h5sc.py Refactored to use real h5sc fixtures instead of mocks
tests/conftest.py Added spatial data fixtures for testing region selection
src/scportrait/tools/sdata/processing/_subset.py Implemented mask_region function for spatial masking/cropping
src/scportrait/tools/sdata/processing/init.py Exported new mask_region function
src/scportrait/tools/h5sc/operations.py Added functions for updating obs on disk, spatial coordinates, and cell subsetting
src/scportrait/tools/h5sc/init.py Exported new h5sc operation functions
src/scportrait/plotting/sdata.py Fixed dimension calculation for GeoDataFrame shapes
src/scportrait/plotting/h5sc.py Added support for non-memory-backed h5sc plotting
src/scportrait/pipeline/project.py Improved h5sc handle management to prevent file handle leaks
src/scportrait/pipeline/mask_filtering/region_masking.py New module for image masking with automatic region detection
src/scportrait/pipeline/_utils/spatialdata_helper.py Added debug print statement for memory calculation
src/scportrait/pipeline/_utils/constants.py Added DEFAULT_IDENTIFIER_FILENAME constant
src/scportrait/io/h5sc.py Updated to store file handle and source path in uns
requirements/requirements.txt Added affine and rasterio dependencies
src/scportrait/tools/sdata/write/_write.py Fixed channel name comparison logic

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

elif isinstance(image, xr.DataArray):
image = image

print(image.dtype)
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug print statement should be removed before merging to production. If this information is needed for diagnostics, consider using proper logging instead.

Copilot uses AI. Check for mistakes.
transform = get_transformation(mask, coordinate_system)

if check_memory(mask):
print("Array fits in memory, using in-memory calculation.")
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug print statement should be removed or replaced with proper logging. For production code, use the logging module to allow users to control output verbosity.

Copilot uses AI. Check for mistakes.
else:
# non backed h5sc adata objects can be accessed directly
# these are created by slicing original h5sc objects
col = "scportrait_cell_id"
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hardcoded column name 'scportrait_cell_id' is duplicated here and in lines 393-394. Consider importing and using DEFAULT_CELL_ID_NAME constant from scportrait.pipeline._utils.constants to maintain consistency.

Suggested change
col = "scportrait_cell_id"
col = DEFAULT_CELL_ID_NAME

Copilot uses AI. Check for mistakes.
If `to_disk` is False, returns the subsetted AnnData object. If `to_disk` is True, saves the subsetted AnnData object to disk and returns None.
"""
if outpath is not None:
if not isinstance(outpath, (str | Path)):
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The isinstance check should use a tuple syntax for multiple types: isinstance(outpath, (str, Path)) instead of (str | Path). The union operator syntax is only valid in type hints, not in runtime isinstance checks.

Suggested change
if not isinstance(outpath, (str | Path)):
if not isinstance(outpath, (str, Path)):

Copilot uses AI. Check for mistakes.
Comment on lines +267 to +268
# Load a fresh AnnData
adata = read_h5sc(self.extraction_f.output_path or self.extraction_f.extraction_file)
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will fail if extraction_f is None (as checked in the old code). The validation should still be performed before attempting to read the h5sc file, otherwise this will raise an AttributeError.

Suggested change
# Load a fresh AnnData
adata = read_h5sc(self.extraction_f.output_path or self.extraction_f.extraction_file)
# Ensure extraction configuration is available before accessing its attributes
extraction_f = getattr(self, "extraction_f", None)
if extraction_f is None:
raise ValueError(
"Extraction configuration 'extraction_f' is not set. "
"Run the extraction step before accessing 'h5sc'."
)
# Load a fresh AnnData
adata = read_h5sc(extraction_f.output_path or extraction_f.extraction_file)

Copilot uses AI. Check for mistakes.
)
channel_names_old = image.coords["c"].values.tolist()
if channel_names_old != channel_names:
if any(channel_names_old != channel_names):
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comparison channel_names_old != channel_names produces a boolean array when comparing lists/arrays element-wise. The correct check should be channel_names_old != channel_names (direct list comparison) or not np.array_equal(channel_names_old, channel_names) if they are numpy arrays. Using any() on a scalar boolean will fail if they are plain lists.

Suggested change
if any(channel_names_old != channel_names):
if channel_names_old != channel_names:

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants