This project predicts fetal health states (Normal, Suspect, Pathologic) from Cardiotocographic (CTG) data using machine learning. It was developed as part of Hackathon 2025.
- ctg_hackathon.ipynb – main Jupyter notebook with the full pipeline
- CTG_clean.csv – cleaned dataset used for training
- metrics_table.csv / metrics_table.png – model performance results
- submission.csv – predictions in required format (id, NSP)
- README.md – project description
- Data Cleaning: removed duplicates, constant features, and ID-like columns.
- Preprocessing: filled missing values, standardized numeric features, one-hot encoded categorical features.
- Splitting: 80/20 train–test split with stratified 5-fold cross-validation.
- Models: compared Logistic Regression (baseline) and Random Forest (ensemble).
- Evaluation: used Macro F1 (primary) and Balanced Accuracy (secondary) to handle class imbalance.
- Random Forest was the best model.
- CV Macro F1 (5-fold): 0.9894
- CV Balanced Accuracy (5-fold): 0.9849
- Hold-out Macro F1: 0.9759
- Hold-out Balanced Accuracy: 0.9651
Confusion matrix shows high accuracy. Most errors were between Normal and Suspect, while Pathologic was identified well.
- Clone the repo and open in your environment.
- (Optional) create a virtual environment:
python -m venv .venv .venv\Scripts\activate # Windows source .venv/bin/activate # Linux/Mac