This repository contains the full data processing and analysis pipeline used in:
“Associations of dementia polyexposure scores to Alzheimer’s disease endophenotypes in a diverse population”
The project evaluates how clinical risk scores (CRS)—mCAIDE, WHICAP, LIBRA, and CogDRisk—relate to:
- Alzheimer’s disease (AD) endophenotypes (plasma biomarkers, neuroimaging, cognition)
- pTau217/Aβ42 positivity (proxy for amyloid PET positivity)
- Cognitive impairment (MCI, dementia)
The workflow is organized into three sequential stages:
- Data Standardization (Preprocessing)
- Imputation (Handling Missingness)
- Analysis (Statistical Modeling and Results)
Performs initial preprocessing and harmonization of raw HABS-HD data.
-
Biomarker processing
- Log-transformation of plasma biomarkers
- Z-score normalization (mean = 0, SD = 1)
- Construction of ratios (e.g., Aβ42/Aβ40, pTau217/Aβ42)
- Outlier removal using IQR-based filtering
-
Neuroimaging variables
- Cortical thickness aggregation
- Hippocampal volume averaging
-
Cognitive measures
- Z-score standardization
- Composite domains:
- Memory
- Verbal ability
- Executive function
-
Clinical covariates
- BMI, hypertension, diabetes, dyslipidemia
- Depression, smoking, alcohol use, physical activity
- Alignment of CRS-specific variables
Handles missing data across CRS variables and covariates.
-
missForest imputation
- Supports mixed data types
- Non-parametric (random forest–based)
-
Feature inclusion
- Demographics
- Clinical variables
- Biomarkers
- CRS predictors
-
Validation
- Comparison with complete-case analysis
- Bias assessment
Core statistical analysis linking CRS to outcomes and biomarkers.
- mCAIDE
- WHICAP
- LIBRA
- CogDRisk
- CRS - demographic covariates
-
Logistic regression:
- MCI
- Dementia
- Combined outcome
-
Adjustments:
- APOE genotype
- Demographics (where applicable)
-
Metrics:
- AUC
- Nagelkerke R²
- DeLong test
- Linear regression for:
- Plasma biomarkers
- Neuroimaging measures
- Cognitive performance
-
Groups:
- Non-Hispanic White (NHW)
- Hispanic/Latinx (LA)
- Black/African American (AA)
-
Pairwise z-tests for coefficient differences
- Removes demographic components from CRS
- Reintroduces them as covariates in each regression
- Evaluates incremental predictive contribution
Derives and validates the pTau217/Aβ42 cutoff.
- Youden index cutoff derivation
- Logistic regression
- Model evaluation:
- Odds ratios
- AUC
- R²