Skip to content

rakmohan/icu-mortality-prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ICU In-Hospital Mortality Prediction — MIMIC-III

Machine Learning & Deep Learning Project


⚠️ Data Access Notice MIMIC-III data is not included in this repository. Access requires:

  1. Completing CITI training at citiprogram.org
  2. Submitting a credentialing application at physionet.org
  3. Downloading MIMIC-III v1.4 from physionet.org/content/mimiciii/1.4

Place the downloaded .csv.gz files in the data/ directory before running the notebook. A demo subset (~100 patients) is available at physionet.org/content/mimiciii-demo/1.4 for testing without full access.


Task

Predict in-hospital mortality using the first 24 hours of an ICU stay.

  • Why this task? Different from readmission and length-of-stay. Mortality prediction within the first 24h is clinically actionable — it guides care escalation decisions, ICU resource allocation, and palliative care discussions.
  • Data: MIMIC-III v1.4 (Beth Israel Deaconess Medical Center ICU data)
  • Cohort: First ICU stay, age ≥ 18, ICU LOS ≥ 24 hours (~46K patients)
  • Outcome: Hospital expire flag (binary: 0 = survived, 1 = died in hospital)

Models

# Model Type Notes
1 SVM (RBF kernel) Classical ML Baseline; PCA-reduced features
2 Decision Tree (pruned) Classical ML Cost-complexity pruning via CV
3 Cox Proportional Hazards Survival Time-to-event; interpretable HRs
4 LSTM (Bidirectional + Attention) Deep Learning Temporal vital sign sequences
5 Transformer Encoder (CLS token) Deep Learning Self-attention on 12 time steps

Project Structure

icu-mortality-prediction/
├── mortality_prediction.ipynb   ← Main notebook (all sections)
├── utils/
│   ├── mimic_utils.py           ← Feature engineering utilities
│   └── __init__.py
├── data/                        ← Place MIMIC-III CSV files here
├── outputs/
│   ├── figures/                 ← All saved plots
│   └── models/                  ← Saved model checkpoints
├── requirements.txt
└── README.md

Setup

# 1. Create virtual environment
python -m venv .venv
source .venv/bin/activate       # Windows: .venv\Scripts\activate

# 2. Install dependencies
pip install -r requirements.txt

# 3. Place MIMIC-III CSV files in data/
#    Required files:
#      PATIENTS.csv
#      ADMISSIONS.csv
#      ICUSTAYS.csv
#      CHARTEVENTS.csv    (large — ~33GB uncompressed)
#      LABEVENTS.csv      (large — ~10GB uncompressed)
#      DIAGNOSES_ICD.csv

# 4. Launch Jupyter
jupyter notebook mortality_prediction.ipynb

Features Engineered

Static (per admission)

  • Demographics: age, gender
  • Admission type: EMERGENCY / ELECTIVE / URGENT
  • Ethnicity (5 categories)
  • ICU unit type (MICU, SICU, CCU, CSRU, etc.)

Temporal — First 24 Hours (vital signs)

Vital Item IDs Stats
Heart rate 211, 220045 min, max, mean, std
Systolic BP 51, 442, 220179... min, max, mean, std
Diastolic BP 8368, 220180... min, max, mean, std
Temperature (°C) 223762, 676, 223761, 678 min, max, mean, std
SpO2 646, 220277 min, max, mean, std
Respiratory rate 615, 618, 220210... min, max, mean, std
GCS total 198, 226755 min, max, mean, std

Lab Values — First 24 Hours

Creatinine, BUN, WBC, Hemoglobin, Sodium, Glucose, Bicarbonate, Lactate, Potassium, Bilirubin (min, max, mean per stay)

Diagnosis Codes

Top-50 ICD-9 diagnosis codes as binary bag-of-codes features


Key Notebook Sections

Section Content
1 Environment setup, imports, reproducibility
2 MIMIC-III data loading, cohort selection
3 Feature engineering (static, vitals, labs, diagnoses)
4 EDA — outcome distribution, feature distributions, heatmap
5 PCA explained variance + scatter, UMAP projection
6 Train/test split, SMOTE class balancing
7 SVM (RBF) baseline
8 Decision Tree with cost-complexity pruning
9 Cox Proportional Hazards + KM curves
10 Bidirectional LSTM with temporal attention
11 Transformer Encoder with CLS token
12 ROC curves, PR curves, calibration, confusion matrices, results table
15 Key takeaways, feature importance, hazard ratios

Evaluation Metrics

  • AUROC — primary metric (area under ROC curve); threshold-independent
  • AUPRC — especially important given class imbalance (~10–15% mortality)
  • F1 Score — at default 0.5 threshold
  • Calibration — reliability diagram; well-calibrated models are safer clinically
  • Concordance Index (C-statistic) — for Cox PH model
  • Confusion Matrix — false negatives (missed deaths) are clinically costly

Results

Cohort: 32,575 ICU stays · 10.9% mortality rate · 123 features

Model AUROC AUPRC F1 Accuracy
Transformer Encoder 0.7683 0.3309 0.3891 0.8195
SVM (RBF) 0.7502 0.2458 0.3687 0.7450
LSTM 0.7479 0.3067 0.3721 0.8196
Cox PH (C-index) 0.7238
Decision Tree 0.6955 0.2565 0.3036 0.7050

References

  1. Johnson et al. (2016). MIMIC-III, a freely accessible critical care database. Scientific Data.
  2. Wang et al. (2020). MIMIC-Extract: A data extraction, preprocessing, and representation pipeline for MIMIC-III. CHIL.
  3. Harutyunyan et al. (2019). Multitask learning and benchmarking with clinical time series data. Scientific Data.
  4. Rajpurkar et al. (2017). CheXNet: Radiologist-level pneumonia detection on chest X-rays. arXiv.

About

ICU in-hospital mortality prediction using MIMIC-III — classical ML, survival analysis, and deep learning on the first 24 hours of an ICU stay.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors