EMT data analysis

This repository contains code for reproducing the plots shown in our manuscript [1]. This repository uses outputs generated by the EMT_image_analysis repository, such as image segmentations and 3D meshes.

[1] - A human induced pluripotent stem (hiPS) cell model for the holistic study of epithelial to mesenchymal transitions (EMTs)

Note

This code has been tested on Ubuntu 18.04.2 LTS and Windows 10 using Python 3.11.

Installation

Install Python 3.11 and pip >= 24.0.0.
Install the dependencies for lxml. On Ubuntu or Debian:

sudo apt-get install libxml2-dev libxslt-dev python-dev

Create a new virtual environment and install dependencies:

python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

(Alternatively, if you have pdm, you can run pdm sync.)

How to run

The analysis pipeline consists of four sequential steps. Steps 1-3 generate intermediate data, while Step 4 produces the final figures and statistical analysis. Pre-computed outputs from Steps 1-3 are available on AWS, so Step 4 can be run directly without executing the preceding steps.

Step 1 — Feature extraction

python EMT_data_analysis/analysis_scripts/Feature_extraction.py

Extracts per-Z-plane features from each movie: colony mask area and fluorescence intensity (main channel plus additional channels when available). Movies are processed in parallel using joblib. Each movie produces a CSV stored in EMT_data_analysis/results/feature_extraction/.

Dual-camera alignment: The imaging system uses two cameras (Camera 1: brightfield + 638 nm; Camera 2: 488 nm + 561 nm). Since the all-cells segmentation mask is derived from brightfield (Camera 1), the mask is aligned to Camera 2 coordinates using the dual-camera calibration matrix before extracting intensity from 488/561 nm channels. Channels on the same camera as the mask do not require alignment.

Output: EMT_data_analysis/results/feature_extraction/Features_bf_colony_mask_*Data-ID*.csv

Step 2 — Metric computation

python EMT_data_analysis/analysis_scripts/Metric_computation.py

Compiles per-movie CSVs from Step 1 into a single manifest and computes gene-specific expression metrics:

SOX2: Time of half-maximal expression (first timepoint where smoothed intensity drops to 50% of dynamic range)
TBXT: Time of maximum expression (peak of smoothed intensity curve)
EOMES: Time of maximum expression (peak of smoothed intensity curve)
CDH1: Time of inflection of E-cadherin expression (minimum of second derivative of smoothed intensity)

Mean intensity is computed as total intensity divided by all-cells mask area, averaged over the bottom 10 Z-slices above the glass. Intensity curves are smoothed using a Savitzky-Golay filter (polynomial order 2). Movies are processed in parallel using joblib.

To load the imaging manifest from a local file instead of AWS:

python EMT_data_analysis/analysis_scripts/Metric_computation.py --local [--local-csv path/to/file.csv]

Output: EMT_data_analysis/results/metric_computation/Image_analysis_extracted_features.csv

Step 3 — Nuclei localization

python EMT_data_analysis/analysis_scripts/Nuclei_localization.py

Classifies individual nuclei as inside or outside the collagen IV basement membrane mesh at each timepoint. Nuclear centroids from 3D instance segmentation are tested against the mesh boundary using ray-casting.

Output: EMT_data_analysis/results/nuclei_localization/Migration_timing_trough_mesh_extracted_feature.csv

Step 4 — Analysis and figure generation

python EMT_data_analysis/analysis_scripts/Analysis_tools.py

Generates all manuscript figures and statistical analyses. By default, input manifests are automatically downloaded from AWS, so this step can be run independently of Steps 1-3.

Output: EMT_data_analysis/results/figures/

Optional — 3D example rendering

The functions in EMT_data_analysis/figure_generation/ can be used to generate 3D renderings shown in the paper. These have only been tested on Ubuntu 18.04/22.04.

On Ubuntu or Debian:

sudo apt-get install xvfb libgl1-mesa-glx

On Windows: Comment out any instance of pv.start_xvfb() in the code before running.

All Cells Mask

python EMT_data_analysis/figure_generation/colony_mask.py --data_id [Optional] --output_directory [Optional]

If no input arguments are provided, the code will default to the data shown in the paper and output results to EMT_data_analysis/results/3D_all_cells_mask. Data ID values are only valid inputs if they have a non-empty value for All Cells Mask File Download in the image_and_segmentation_data.csv manifest on AWS.

Inside-Outside Classification

python EMT_data_analysis/figure_generation/inside-outside_classification.py --data_id [Optional] --output_directory [Optional]

If no input arguments are provided, the code will default to the data shown in the paper and output results to EMT_data_analysis/results/Inside-Outside/mesh-figures. Data ID values are only valid inputs if they have a non-empty value for CollagenIV Segmentation Mesh Folder in the image_and_segmentation_data.csv manifest on AWS.

Contact

If you have questions about this code, please reach out to us at cells@alleninstitute.org.

Licensing

All code in this repository is provided to you under the Allen Institute Software License.

Name		Name	Last commit message	Last commit date
Latest commit History 221 Commits
.github/workflows		.github/workflows
EMT_data_analysis		EMT_data_analysis
docs/quilt		docs/quilt
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
pdm.lock		pdm.lock
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EMT data analysis

Note

Installation

How to run

Step 1 — Feature extraction

Step 2 — Metric computation

Step 3 — Nuclei localization

Step 4 — Analysis and figure generation

Optional — 3D example rendering

All Cells Mask

Inside-Outside Classification

Contact

Licensing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 7

Uh oh!

Languages

License

AllenCell/EMT_data_analysis

Folders and files

Latest commit

History

Repository files navigation

EMT data analysis

Note

Installation

How to run

Step 1 — Feature extraction

Step 2 — Metric computation

Step 3 — Nuclei localization

Step 4 — Analysis and figure generation

Optional — 3D example rendering

All Cells Mask

Inside-Outside Classification

Contact

Licensing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 7

Uh oh!

Languages

Packages