Conversation
…lignment - Parallelize movie processing using joblib (n_jobs=32) for faster extraction - Fix dual-camera alignment: apply segmentation mask alignment only when the raw fluorescence channel is on a different camera than the brightfield. Camera 1 (brightfield, 638nm) channels share the mask coordinate space; Camera 2 (488nm, 561nm) channels require the inverse calibration transform. Previously, the same alignment was applied uniformly to all channels, causing misalignment for channels on the same camera as the mask. - Extract intensity from Channel 2 and Channel 3 when available, with per-channel alignment based on wavelength-to-camera mapping from manifest - Add retry logic with exponential backoff for BioImage loading and dask array computation to handle transient network errors - Support fixed-cell (immunostaining) experiments with correct timepoint calculation from fixation time metadata - Add local file loading option to io.load_imaging_and_segmentation_dataset() - Add setuptools package discovery config to pyproject.toml - Add joblib dependency to pyproject.toml - Improve README with pipeline overview, step descriptions, gene metric definitions, and dual-camera alignment explanation
There was a problem hiding this comment.
Pull request overview
This PR refactors the feature extraction pipeline to improve performance through parallel processing, fix dual-camera alignment issues, and add support for fixed-cell experiments. The main change addresses a critical bug where segmentation mask alignment was incorrectly applied uniformly to all channels, rather than selectively based on which camera each wavelength is captured by.
Changes:
- Implemented parallel movie processing using joblib (32 workers) to significantly reduce extraction runtime
- Fixed dual-camera alignment logic to apply coordinate transforms only when raw fluorescence and segmentation mask are on different cameras
- Added retry logic with exponential backoff for network errors when loading BioImage files and computing dask arrays
- Extended extraction to support additional fluorescence channels (Channel 2 and Channel 3) with per-channel alignment
- Added support for fixed-cell immunostaining experiments with correct timepoint calculation from fixation metadata
- Enhanced documentation with pipeline overview and detailed step descriptions
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| pyproject.toml | Added joblib dependency and setuptools package discovery configuration |
| README.md | Expanded documentation with pipeline steps, gene metric definitions, and dual-camera alignment explanation |
| EMT_data_analysis/tools/io.py | Added local file loading option to load_imaging_and_segmentation_dataset() |
| EMT_data_analysis/analysis_scripts/Feature_extraction.py | Complete rewrite with parallel processing, per-channel alignment, retry logic, and multi-channel extraction |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ] | ||
|
|
||
| [tool.setuptools.packages.find] | ||
| include = ["EMT_data_analysis*"] |
There was a problem hiding this comment.
The [tool.setuptools.packages.find] configuration is missing the where parameter. Without specifying where = ['.'] or the appropriate source directory, setuptools may not correctly discover packages. Consider adding where = ['.'] to explicitly define the package search location.
| include = ["EMT_data_analysis*"] | |
| include = ["EMT_data_analysis*"] | |
| where = ["."] |
| path = local_path | ||
| else: | ||
| # Default local path: project root (parent of EMT_data_analysis package) | ||
| project_root = Path(__file__).parent.parent.parent |
There was a problem hiding this comment.
Using parent.parent.parent to navigate directory hierarchy is fragile and difficult to understand. Consider using a more explicit approach such as defining a project root constant or using Path(__file__).parents[2] with a comment explaining the directory structure.
| except Exception as e: | ||
| last_error = e | ||
| if attempt < max_retries - 1: | ||
| wait_time = min(2 ** attempt, 30) |
There was a problem hiding this comment.
The magic number 30 (maximum wait time in seconds) should be extracted as a named constant (e.g., MAX_RETRY_WAIT_SECONDS = 30) to improve code clarity and maintainability.
| except (ServerDisconnectedError, ClientError, ConnectionError, TimeoutError, UnsupportedFileFormatError) as e: | ||
| last_error = e | ||
| if attempt < max_retries - 1: | ||
| wait_time = min(2 ** attempt, 60) |
There was a problem hiding this comment.
The magic number 60 (maximum wait time in seconds) should be extracted as a named constant (e.g., MAX_RETRY_WAIT_SECONDS = 60) to improve code clarity and maintainability. Note this differs from the 30-second cap in load_image_with_retry—consider whether these should be consistent.
|
|
||
| # Determine timepoints to process (first 48 hours = 98 timepoints) | ||
| num_timepoints = raw_reader.dims.T | ||
| max_timepoint = min(num_timepoints, 98) |
There was a problem hiding this comment.
The magic number 98 (representing 48 hours of timepoints) should be extracted as a named constant (e.g., MAX_TIMEPOINTS_48_HOURS = 98) with a comment explaining the calculation basis.
| except Exception as e: | ||
| return (movie_id, False, f"{type(e).__name__}: {str(e)}") | ||
|
|
||
| def compute_bf_colony_features_all_movies(output_folder, align=True, n_jobs=32): |
There was a problem hiding this comment.
The default n_jobs=32 may be too aggressive for systems with fewer CPU cores and could cause resource contention or memory issues. Consider using n_jobs=-1 (all available cores) or calculating based on os.cpu_count() to adapt to the system's capabilities.
CI runs `pdm export -f requirements` which includes hashes by default. The previous export used --no-hashes, causing the diff check to fail. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Summary
Parallelize movie processing using joblib (n_jobs=32) for faster extraction
Fix dual-camera alignment: apply segmentation mask alignment only when the raw fluorescence channel is on a different camera than the brightfield. Camera 1 (brightfield, 638nm) channels share the mask coordinate space; Camera 2 (488nm, 561nm) channels require the inverse calibration transform. Previously, the same alignment was applied uniformly to all channels, causing misalignment for channels on the same camera as the mask.
Extract intensity from Channel 2 and Channel 3 when available, with per-channel alignment based on wavelength-to-camera mapping from manifest
Add retry logic with exponential backoff for BioImage loading and dask array computation to handle transient network errors
Support fixed-cell (immunostaining) experiments with correct timepoint calculation from fixation time metadata
Add local file loading option to io.load_imaging_and_segmentation_dataset()
Add setuptools package discovery config to pyproject.toml
Add joblib dependency to pyproject.toml
Improve README with pipeline overview, step descriptions, gene metric definitions, and dual-camera alignment explanation
Test plan