A collection of machine learning and deep learning projects applying core ML methods to genomics and transcriptomics data. Each notebook demonstrates end-to-end workflows including preprocessing, modeling, evaluation, and interpretation.
scikit-learn · PyTorch · XGBoost · SHAP · NumPy · Pandas · Matplotlib
Notebook: 1000genomesAncestry.ipynb
Goal: Predict ancestry from SNP data.
Models: KNN, Random Forest, Dense Neural Network, Neural Network Ensemble
Focus: Feature encoding, supervised classification, ensemble learning.
Notebook: BreastCancerClassification.ipynb
Goal: Classify breast cancer samples into PAM50 subtypes.
Models: XGBoost, Dense Neural Network
Focus: Multi-class classification and interpretability with SHAP.
Notebook: SingleCellRNASeq.ipynb
Goal: Cluster and predict immune cell types from single-cell data.
Unsupervised: PCA + Leiden, K-Means, Hierarchical
Supervised: Logistic Regression, Random Forest, MLP Neural Network
Focus: Dimensionality reduction, marker identification, supervised label transfer.
Notebook: TFBindingSitePrediction.ipynb
Goal: Predict TF binding sites from DNA sequences.
Model: CNN (PyTorch)
Focus: Sequence encoding, CNN design, motif-level interpretation.
Notebook: BreastTumorMalignancyCNN.ipynb
Goal: Predict breast tumor malignancy from histopathology images using deep CNN architectures.
Model: Custom PyTorch CNN, Fine-tuned ResNet18, FiLM-augmented ResNet18
Focus: Transfer learning, fine-tuning, conditional modulation, and robust evaluation with class imbalance handling.