EMP-FM

Author: Shi Pan, UCL Genetics Institute

EMP-FM is a foundation model that can classify epithelial-mesenchymal transition (EMT) states in single cell RNA-seq data.

Description

Epithelial–mesenchymal plasticity (EMP) plays a significant role in various biological processes including tumour progression and chemoresistance. However, the expression programmes underlying the epithelial–mesenchymal transition (EMT) in cancer are diverse, and accurately defining the EMT status of tumour cells remains a challenging task. In this study, we employed a pre-trained single-cell foundation model (scFM) to develop an EMP-foundation model (EMP-FM) that allows us to capture discrete states within the EMT continuum in single cell cancer data. In capturing EMP states, we achieved an average Area Under the Receiver Operating Characteristic curve (AUROC) of 90% across multiple cancer types. We propose a new metric, ADESI, to aid the biological interpretability of our model, and derive EMP signatures liked with energy metabolism and motility reprogramming underlying these state switches. Our study provides a proof of concept that scFMs can be applied to characterise cell states in single cell data, and proposes a generalisable framework to predict EMP in single cell RNA-seq that can be adapted and expanded to characterise other cellular states.

The preprint presenting this tool Classifying epithelial-mesenchymal transition (EMT) states in single cell cancer data using large language models is available on biorXiv.

Environment Setup

To set up the environment, you can either use Conda or Pip:

Conda

Run the following command to recreate the environment using the saved conda environment file:

conda env create -f environment.yml

Pip

Run the following command to recreate the environment using the saved pip environment file:

pip install -r requirements.txt

Usage

The code of the scMultiNet generic classifier is included in the scFM folder. All the code for training, validating and applying the EMP-FM model is included in the Experiment folder.

The Experiment folder is structured as follows:

Step_0_preprocess_raw_data

All the code for preprocessing the raw data in our manuscript, including the generation of the count matrix and the annotation file. Please use the "0_preprocess_example.ipynb" to generate the count matrix and the annotation file for your own dataset. And please create a Data folder in the Step_0_preprocess_raw_data folder to store the processed data.

Step_1_train_phase_1

All the code for training the EMP-FM model in phase 1 in our manuscript.

Step_2_train_phase_2

All the code for training the EMP-FM model in phase 2 in our manuscript.

Step_3_visualise_performances

baseline_roc_confusion.ipynb: visualise the ROC curve and the confusion matrix of the baseline models. It provides a comparison between the baseline models and the EMP-FM model.

plot_ROC_confusion.ipynb: visualise the ROC curve and the confusion matrix of the EMP-FM model for different tissue types.

Step_4_validate_on_unseen_dataset

All of the code for validating the EMP-FM model on the unseen dataset in our paper.

Step_5_Embedding_Space

Visualise the embedding space of the EMP-FM model and plot the trajectory of the EMP states in the embedding space.

Step_6_ADESI_score support_data

Visualise the ADESI score of the EMP-FM model.

Contributing

If you find a bug or want to suggest a new feature for EMP-FM, please open a GitHub issue in this repository. Pull requests are also welcome!

License

EMP-FM is released under the GNU-GPL License. This code is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
Experiment		Experiment
scLLM		scLLM
EMTLM-fig.png		EMTLM-fig.png
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
scLLM_conda_env.yml		scLLM_conda_env.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

EMP-FM

Author: Shi Pan, UCL Genetics Institute

Description

Environment Setup

Conda

Pip

Usage

Step_0_preprocess_raw_data

Step_1_train_phase_1

Step_2_train_phase_2

Step_3_visualise_performances

Step_4_validate_on_unseen_dataset

Step_5_Embedding_Space

Step_6_ADESI_score support_data

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

secrierlab/EMP-FM

Folders and files

Latest commit

History

Repository files navigation

EMP-FM

Author: Shi Pan, UCL Genetics Institute

Description

Environment Setup

Conda

Pip

Usage

Step_0_preprocess_raw_data

Step_1_train_phase_1

Step_2_train_phase_2

Step_3_visualise_performances

Step_4_validate_on_unseen_dataset

Step_5_Embedding_Space

Step_6_ADESI_score support_data

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages