PredictBC: Predictive Ensemble Model for Breast Cancer Classification Using Microbial and Metabolomic Signatures
PredictBC leverages advanced ensemble machine learning methods to classify breast cancer cases (pre- and postmenopausal) against control subjects using microbial species and metabolite (BGC) abundance data. The ensemble method significantly enhances predictive performance by integrating predictions from multiple classifiers.
The data includes abundance matrices for microbial species and metabolites (BGC), as well as metadata separately for premenopausal and postmenopausal groups. These datasets distinguish between cases and controls:
- Premenopause_Case vs Premenopause_Control
- Postmenopause_Case vs Postmenopause_Control
The ensemble model incorporates Random Forest, Gradient Boosting, and XGBoost classifiers, employing a stacking ensemble approach to enhance predictive accuracy.
- Data preprocessing and feature engineering
- Implementation and training of individual classifiers (RF, GB, XGB)
- Stacking ensemble to combine classifier predictions
- Evaluation using ROC-AUC and confusion matrix
- Identification of significant predictive features
- Confusion Matrix: Illustrates classification performance of the stacking ensemble model.
- ROC Curves: Show performance comparison among individual classifiers (Random Forest, Gradient Boosting, XGBoost).
- Feature Importance: Highlights the most significant microbial and metabolite predictors contributing to the classification.
- Clone the repository.
- Navigate to the
scripts/directory. - Execute the Jupyter notebook (
Real_Premen_Species_stack.ipynb) after placing your data in thedata/folder.
- Python (>=3.8)
- Pandas
- NumPy
- Scikit-learn
- XGBoost
- Matplotlib
Install required libraries using:
pip install pandas numpy scikit-learn xgboost matplotlibContributions are welcome. Please open issues for suggestions or submit pull requests for enhancements.
This project is open-source. You're encouraged to freely use, adapt, and modify the scripts provided. Please acknowledge this repository if you use or modify its scripts for research or publication.