Skip to content

Rewrite Metric_computation with parallel processing and improved pipeline#44

Merged
smishra3 merged 2 commits intoreSub-v1from
feature/metric-computation-parallel
Feb 6, 2026
Merged

Rewrite Metric_computation with parallel processing and improved pipeline#44
smishra3 merged 2 commits intoreSub-v1from
feature/metric-computation-parallel

Conversation

@smishra3
Copy link
Collaborator

@smishra3 smishra3 commented Feb 4, 2026

Summary

  • Parallelize area-at-the-glass computation across movies using joblib with retry logic and exponential backoff for network resilience
  • Fix SOX2 half-maximal calculation to use first timepoint intensity (biologically meaningful baseline) instead of max(), and correctly convert timepoints to hours
  • Add fixed-cell (single timepoint) image support, metadata-only movie inclusion, and CLI arguments (--local, --local-csv) for local manifest loading
  • Update io.py to support load_from_aws/local_path parameters; add joblib>=1.3.0 dependency; improve README Step 2 with gene metric definitions

Details
Parallel processing: add_bottom_mip_migration now processes movies in parallel via joblib.Parallel instead of sequential tqdm loop. Each movie is handled by _process_single_movie_area with up to 5 retries and exponential backoff (1s, 2s, 4s, 8s, 16s) for S3 network errors.

Bug fixes:

  • SOX2 half-maximal: old code used max(df_id.int_smooth.values[0]) which calls max() on a scalar; new code uses
    df_id.int_smooth.values[0] (first timepoint value) as the upper bound of the dynamic range, and converts to hours with *(30/60)
  • Fixed-cell images: detects single-timepoint images (dims.T == 1) and always indexes T=0 instead of the computed timepoint index

Pipeline improvements:

  • Vectorized Z-plane normalization (replaces slow apply + lambda)
  • Reduced memory in area computation by passing only needed columns to groupby
  • Migration onset times rounded to nearest 0.5 hour for alignment with time grid
  • Output columns reordered: key analysis columns first, then metadata
  • Multi-channel intensity columns (Channel 2, Channel 3) included in output
  • Movies without All Cells Mask added as metadata-only rows
  • CSV saved without row index (index=False)

Test plan

  • Verify CI passes (lock-check and requirements-check)
  • Confirm Metric_computation.py imports succeed
  • Verify --local CLI flag works for local manifest loading
  • Compare output Image_analysis_extracted_features.csv against previous version to confirm expected differences (SOX2 values, hours conversion, additional columns)

…line

- Parallelize area-at-the-glass computation using joblib with retry logic
  and exponential backoff for network resilience
- Fix SOX2 half-maximal calculation to use first timepoint intensity value
  instead of max(), and convert to hours with *(30/60)
- Add fixed-cell (single timepoint) image support in area computation
- Add metadata-only movies (those without All Cells Mask) to final manifest
- Vectorize normalized Z-plane subtraction (replace slow apply+lambda)
- Reduce memory in area computation by passing only needed columns
- Round migration onset times to nearest 0.5 hour for time grid alignment
- Reorder output columns: key analysis columns first, then metadata
- Include multi-channel intensity columns (Channel 2, Channel 3) in output
- Add CLI arguments (--local, --local-csv) for local manifest loading
- Save CSV without row index
- Update io.py: add load_from_aws and local_path parameters to
  load_imaging_and_segmentation_dataset()
- Add joblib>=1.3.0 to pyproject.toml dependencies
- Add [tool.setuptools.packages.find] to fix flat-layout package discovery
- Update README with gene metric definitions and pipeline details

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@smishra3 smishra3 merged commit 365f56a into reSub-v1 Feb 6, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants