This repository contains code for reproducing the plots shown in our manuscript [1]. This repository uses outputs generated by the EMT_image_analysis repository, such as image segmentations and 3D meshes.
This code has been tested on Ubuntu 18.04.2 LTS and Windows 10 using Python 3.11.
- Install Python 3.11 and pip >= 24.0.0.
- Install the dependencies for lxml. On Ubuntu or Debian:
sudo apt-get install libxml2-dev libxslt-dev python-dev- Create a new virtual environment and install dependencies:
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt(Alternatively, if you have pdm, you can run pdm sync.)
The analysis pipeline consists of four sequential steps. Steps 1-3 generate intermediate data, while Step 4 produces the final figures and statistical analysis. Pre-computed outputs from Steps 1-3 are available on AWS, so Step 4 can be run directly without executing the preceding steps.
python EMT_data_analysis/analysis_scripts/Feature_extraction.pyExtracts per-Z-plane features from each movie: colony mask area and fluorescence intensity (main channel plus additional channels when available). Movies are processed in parallel using joblib. Each movie produces a CSV stored in EMT_data_analysis/results/feature_extraction/.
Dual-camera alignment: The imaging system uses two cameras (Camera 1: brightfield + 638 nm; Camera 2: 488 nm + 561 nm). Since the all-cells segmentation mask is derived from brightfield (Camera 1), the mask is aligned to Camera 2 coordinates using the dual-camera calibration matrix before extracting intensity from 488/561 nm channels. Channels on the same camera as the mask do not require alignment.
Output: EMT_data_analysis/results/feature_extraction/Features_bf_colony_mask_*Data-ID*.csv
python EMT_data_analysis/analysis_scripts/Metric_computation.pyCompiles per-movie CSVs from Step 1 into a single manifest and computes gene-specific expression metrics:
- SOX2: Time of half-maximal expression (first timepoint where smoothed intensity drops to 50% of dynamic range)
- TBXT: Time of maximum expression (peak of smoothed intensity curve)
- EOMES: Time of maximum expression (peak of smoothed intensity curve)
- CDH1: Time of inflection of E-cadherin expression (minimum of second derivative of smoothed intensity)
Mean intensity is computed as total intensity divided by all-cells mask area, averaged over the bottom 10 Z-slices above the glass. Intensity curves are smoothed using a Savitzky-Golay filter (polynomial order 2). Movies are processed in parallel using joblib.
To load the imaging manifest from a local file instead of AWS:
python EMT_data_analysis/analysis_scripts/Metric_computation.py --local [--local-csv path/to/file.csv]Output: EMT_data_analysis/results/metric_computation/Image_analysis_extracted_features.csv
python EMT_data_analysis/analysis_scripts/Nuclei_localization.pyClassifies individual nuclei as inside or outside the collagen IV basement membrane mesh at each timepoint. Nuclear centroids from 3D instance segmentation are tested against the mesh boundary using ray-casting.
Output: EMT_data_analysis/results/nuclei_localization/Migration_timing_trough_mesh_extracted_feature.csv
python EMT_data_analysis/analysis_scripts/Analysis_tools.pyGenerates all manuscript figures and statistical analyses. By default, input manifests are automatically downloaded from AWS, so this step can be run independently of Steps 1-3.
Output: EMT_data_analysis/results/figures/
The functions in EMT_data_analysis/figure_generation/ can be used to generate 3D renderings shown in the paper. These have only been tested on Ubuntu 18.04/22.04.
On Ubuntu or Debian:
sudo apt-get install xvfb libgl1-mesa-glxOn Windows:
Comment out any instance of pv.start_xvfb() in the code before running.
python EMT_data_analysis/figure_generation/colony_mask.py --data_id [Optional] --output_directory [Optional]If no input arguments are provided, the code will default to the data shown in the paper and output results to EMT_data_analysis/results/3D_all_cells_mask.
Data ID values are only valid inputs if they have a non-empty value for All Cells Mask File Download in the image_and_segmentation_data.csv manifest on AWS.
python EMT_data_analysis/figure_generation/inside-outside_classification.py --data_id [Optional] --output_directory [Optional]If no input arguments are provided, the code will default to the data shown in the paper and output results to EMT_data_analysis/results/Inside-Outside/mesh-figures.
Data ID values are only valid inputs if they have a non-empty value for CollagenIV Segmentation Mesh Folder in the image_and_segmentation_data.csv manifest on AWS.
If you have questions about this code, please reach out to us at cells@alleninstitute.org.
All code in this repository is provided to you under the Allen Institute Software License.