This project implements and compares various deep learning foundation models for single-cell RNA sequencing (scRNA-seq) analysis, focusing on trajectory inference and cell state generation across three biological processes: epithelial-mesenchymal transition (EMT), hematopoiesis, and thymocyte development.
fm-project/
├── data/ # Datasets and generated samples
├── experiments/ # Jupyter notebooks for model experiments
├── models/ # Saved model checkpoints
├── src/ # Source code implementations
├── utils/ # Utility functions
Jupyter notebooks for preprocessing data and running different models:
Data Preprocessing:
emt_preprocess.ipynb- EMT dataset preprocessinghematopoiesis_preprocess.ipynb- Hematopoiesis dataset preprocessingthymocyte_preprocess.ipynb- Thymocyte dataset preprocessing
Model Experiments:
Each model has dedicated notebooks for each dataset:
- scNODE:
scnode_[dataset].ipynb- Neural ODE-based generative model - scDiffusion:
scdiff_[dataset].ipynb- Diffusion model for cell generation - scGPT:
scgpt_[dataset].ipynb- GPT-based foundation model - scVI:
scvi_[dataset].ipynb- Variational inference model
Contains the implementation of two main models:
Cloned from https://github.com/rsinghlab/scNODE
Cloned from https://github.com/EperLuo/scDiffusion
Helper functions for data processing and evaluation:
__init__.py- Package initializationadata.py- AnnData object utilitiesevaluation.py- Evaluation metrics including marker gene monotonicitylatent.py- Latent space analysis utilitiesplot.py- Plotting utilities