Skip to content

This repo contains code for EMT deliverable data analysis and has dependencies associated with output from EMT_image_analysis repo.

License

Notifications You must be signed in to change notification settings

AllenCell/EMT_data_analysis

Repository files navigation

EMT data analysis

This repository contains code for reproducing the plots shown in our manuscript [1]. This repository uses outputs generated by the EMT_image_analysis repository, such as image segmentations and 3D meshes.

[1] - A human induced pluripotent stem (hiPS) cell model for the holistic study of epithelial to mesenchymal transitions (EMTs)

Note

This code has been tested on Ubuntu 18.04.2 LTS and Windows 10 using Python 3.11.

Installation

  1. Install Python 3.11 and pip >= 24.0.0.
  2. Install the dependencies for lxml. On Ubuntu or Debian:
sudo apt-get install libxml2-dev libxslt-dev python-dev
  1. Create a new virtual environment and install dependencies:
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

(Alternatively, if you have pdm, you can run pdm sync.)

How to run

The analysis pipeline consists of four sequential steps. Steps 1-3 generate intermediate data, while Step 4 produces the final figures and statistical analysis. Pre-computed outputs from Steps 1-3 are available on AWS, so Step 4 can be run directly without executing the preceding steps.

Step 1 — Feature extraction

python EMT_data_analysis/analysis_scripts/Feature_extraction.py

Extracts per-Z-plane features from each movie: colony mask area and fluorescence intensity (main channel plus additional channels when available). Movies are processed in parallel using joblib. Each movie produces a CSV stored in EMT_data_analysis/results/feature_extraction/.

Dual-camera alignment: The imaging system uses two cameras (Camera 1: brightfield + 638 nm; Camera 2: 488 nm + 561 nm). Since the all-cells segmentation mask is derived from brightfield (Camera 1), the mask is aligned to Camera 2 coordinates using the dual-camera calibration matrix before extracting intensity from 488/561 nm channels. Channels on the same camera as the mask do not require alignment.

Output: EMT_data_analysis/results/feature_extraction/Features_bf_colony_mask_*Data-ID*.csv

Step 2 — Metric computation

python EMT_data_analysis/analysis_scripts/Metric_computation.py

Compiles per-movie CSVs from Step 1 into a single manifest and computes gene-specific expression metrics:

  • SOX2: Time of half-maximal expression (first timepoint where smoothed intensity drops to 50% of dynamic range)
  • TBXT: Time of maximum expression (peak of smoothed intensity curve)
  • EOMES: Time of maximum expression (peak of smoothed intensity curve)
  • CDH1: Time of inflection of E-cadherin expression (minimum of second derivative of smoothed intensity)

Mean intensity is computed as total intensity divided by all-cells mask area, averaged over the bottom 10 Z-slices above the glass. Intensity curves are smoothed using a Savitzky-Golay filter (polynomial order 2). Movies are processed in parallel using joblib.

To load the imaging manifest from a local file instead of AWS:

python EMT_data_analysis/analysis_scripts/Metric_computation.py --local [--local-csv path/to/file.csv]

Output: EMT_data_analysis/results/metric_computation/Image_analysis_extracted_features.csv

Step 3 — Nuclei localization

python EMT_data_analysis/analysis_scripts/Nuclei_localization.py

Classifies individual nuclei as inside or outside the collagen IV basement membrane mesh at each timepoint. Nuclear centroids from 3D instance segmentation are tested against the mesh boundary using ray-casting.

Output: EMT_data_analysis/results/nuclei_localization/Migration_timing_trough_mesh_extracted_feature.csv

Step 4 — Analysis and figure generation

python EMT_data_analysis/analysis_scripts/Analysis_tools.py

Generates all manuscript figures and statistical analyses. By default, input manifests are automatically downloaded from AWS, so this step can be run independently of Steps 1-3.

Output: EMT_data_analysis/results/figures/

Optional — 3D example rendering

The functions in EMT_data_analysis/figure_generation/ can be used to generate 3D renderings shown in the paper. These have only been tested on Ubuntu 18.04/22.04.

On Ubuntu or Debian:

sudo apt-get install xvfb libgl1-mesa-glx

On Windows: Comment out any instance of pv.start_xvfb() in the code before running.

All Cells Mask

python EMT_data_analysis/figure_generation/colony_mask.py --data_id [Optional] --output_directory [Optional]

If no input arguments are provided, the code will default to the data shown in the paper and output results to EMT_data_analysis/results/3D_all_cells_mask. Data ID values are only valid inputs if they have a non-empty value for All Cells Mask File Download in the image_and_segmentation_data.csv manifest on AWS.

Inside-Outside Classification

python EMT_data_analysis/figure_generation/inside-outside_classification.py --data_id [Optional] --output_directory [Optional]

If no input arguments are provided, the code will default to the data shown in the paper and output results to EMT_data_analysis/results/Inside-Outside/mesh-figures. Data ID values are only valid inputs if they have a non-empty value for CollagenIV Segmentation Mesh Folder in the image_and_segmentation_data.csv manifest on AWS.

Contact

If you have questions about this code, please reach out to us at cells@alleninstitute.org.

Licensing

All code in this repository is provided to you under the Allen Institute Software License.

About

This repo contains code for EMT deliverable data analysis and has dependencies associated with output from EMT_image_analysis repo.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 7