ML-Driven Precision Oncology: Automated Leukemic Stem Cell (LSC) Identification
Distinguishing Leukemic Stem Cells (LSCs) from non-leukemic cells is a critical bottleneck in treating T-cell lymphoblastic leukemia (T-ALL). This project implements a high-precision classification model that integrates multi-omic signatures—including chromatin opening, RNA, and surface epitopes—to automate identification and accelerate therapeutic pathfinding.
-
High-Fidelity Accuracy: Achieved >95% accuracy (with a test set performance of 99.9%) in identifying LSCs from novel transcriptomic inputs.
-
Clinical Impact: Successfully identified driver mutations in Minimal Residual Disease (MRD) datasets, providing a scalable framework for personalized medicine.
-
Multi-Platform Robustness: Validated across diverse high-throughput sequencing technologies, including TEA-seq, CITE-seq, and scRNA-seq.
-
Core Architecture: Stochastic Gradient Descent (SGD) based classifier engineered via the CellTypist framework.
-
Signature Extraction: Automated feature selection using global chromatin opening as a primary molecular marker.
-
Pre-processing: Implemented custom normalization and scaling (MinMaxScaler) pipelines for high-dimensional genomic data.
Plaintext
├── LSC_Model_Training.ipynb # Full training pipeline and feature extraction ├── requirements.txt # Environment dependencies └── README.md # Project documentation
- Python 3.9+
- High-Performance Computing (HPC) environment recommended for multi-terabyte datasets
Bash git clone https://github.com/yourusername/LSC-Identification-Model.git pip install -r requirements.txt